install.packages("HH")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/HH_3.1-40.tgz'
Content type 'application/x-gzip' length 1760082 bytes (1.7 MB)
==================================================
downloaded 1.7 MB
The downloaded binary packages are in
/var/folders/58/m5fvfpw93rz5rtzg6nc5_lbr0000gn/T//RtmpfHCbnT/downloaded_packages
install.packages("bestNormalize")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/bestNormalize_1.6.1.tgz'
Content type 'application/x-gzip' length 967063 bytes (944 KB)
==================================================
downloaded 944 KB
The downloaded binary packages are in
/var/folders/58/m5fvfpw93rz5rtzg6nc5_lbr0000gn/T//RtmpfHCbnT/downloaded_packages
library(HH)
Loading required package: lattice
Loading required package: grid
Loading required package: latticeExtra
Loading required package: multcomp
package ‘multcomp’ was built under R version 3.6.2Loading required package: mvtnorm
package ‘mvtnorm’ was built under R version 3.6.2Loading required package: survival
package ‘survival’ was built under R version 3.6.2Loading required package: TH.data
Loading required package: MASS
package ‘MASS’ was built under R version 3.6.2
Attaching package: ‘TH.data’
The following object is masked from ‘package:MASS’:
geyser
Loading required package: gridExtra
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Registered S3 method overwritten by 'data.table':
method from
print.data.table
library(GGally)
package ‘GGally’ was built under R version 3.6.2Loading required package: ggplot2
package ‘ggplot2’ was built under R version 3.6.2
Attaching package: ‘ggplot2’
The following object is masked from ‘package:latticeExtra’:
layer
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
library(bestNormalize)
package ‘bestNormalize’ was built under R version 3.6.2
Attaching package: ‘bestNormalize’
The following object is masked from ‘package:MASS’:
boxcox
library(dplyr)
package ‘dplyr’ was built under R version 3.6.2
Attaching package: ‘dplyr’
The following object is masked from ‘package:gridExtra’:
combine
The following object is masked from ‘package:MASS’:
select
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(janitor)
package ‘janitor’ was built under R version 3.6.2
Attaching package: ‘janitor’
The following objects are masked from ‘package:stats’:
chisq.test, fisher.test
library(leaps)
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
Found more than one class "atomicVector" in cache; using the first, from namespace 'Matrix'
Also defined by ‘Rmpfr’
Found more than one class "atomicVector" in cache; using the first, from namespace 'Matrix'
Also defined by ‘Rmpfr’
Found more than one class "atomicVector" in cache; using the first, from namespace 'Matrix'
Also defined by ‘Rmpfr’
Found more than one class "atomicVector" in cache; using the first, from namespace 'Matrix'
Also defined by ‘Rmpfr’
Found more than one class "atomicVector" in cache; using the first, from namespace 'Matrix'
Also defined by ‘Rmpfr’
Found more than one class "atomicVector" in cache; using the first, from namespace 'Matrix'
Also defined by ‘Rmpfr’
Found more than one class "atomicVector" in cache; using the first, from namespace 'Matrix'
Also defined by ‘Rmpfr’
Found more than one class "atomicVector" in cache; using the first, from namespace 'Matrix'
Also defined by ‘Rmpfr’
Found more than one class "atomicVector" in cache; using the first, from namespace 'Matrix'
Also defined by ‘Rmpfr’
Found more than one class "atomicVector" in cache; using the first, from namespace 'Matrix'
Also defined by ‘Rmpfr’
[37m── [1mAttaching packages[22m ────────────────────────────────────────────── tidyverse 1.3.0 ──[39m
[37m[32m✓[37m [34mtibble [37m 3.0.3 [32m✓[37m [34mpurrr [37m 0.3.4
[32m✓[37m [34mtidyr [37m 1.1.0 [32m✓[37m [34mstringr[37m 1.4.0
[32m✓[37m [34mreadr [37m 1.3.1 [32m✓[37m [34mforcats[37m 0.5.0[39m
package ‘tibble’ was built under R version 3.6.2package ‘tidyr’ was built under R version 3.6.2package ‘purrr’ was built under R version 3.6.2[37m── [1mConflicts[22m ───────────────────────────────────────────────── tidyverse_conflicts() ──
[31mx[37m [34mdplyr[37m::[32mcombine()[37m masks [34mgridExtra[37m::combine()
[31mx[37m [34mdplyr[37m::[32mfilter()[37m masks [34mstats[37m::filter()
[31mx[37m [34mdplyr[37m::[32mlag()[37m masks [34mstats[37m::lag()
[31mx[37m [34mggplot2[37m::[32mlayer()[37m masks [34mlatticeExtra[37m::layer()
[31mx[37m [34mdplyr[37m::[32mselect()[37m masks [34mMASS[37m::select()
[31mx[37m [34mpurrr[37m::[32mtranspose()[37m masks [34mHH[37m::transpose()[39m
library(dplyr)
library(modelr)
package ‘modelr’ was built under R version 3.6.2
install.packages("glmulti")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/glmulti_1.0.8.tgz'
Content type 'application/x-gzip' length 250892 bytes (245 KB)
==================================================
downloaded 245 KB
The downloaded binary packages are in
/var/folders/58/m5fvfpw93rz5rtzg6nc5_lbr0000gn/T//RtmpfHCbnT/downloaded_packages
library(glmulti)
package ‘glmulti’ was built under R version 3.6.2Loading required package: rJava
package ‘rJava’ was built under R version 3.6.2
avacado <- read.csv("data/avocado.csv") %>% clean_names()
head(avacado)
avacado2 <- avacado %>%
dplyr::select(-c("date", "region", "x"))
head(avacado2)
regsubsets_forward <- regsubsets(average_price ~ ., data = avacado2, nvmax = 10, method = "forward")
sum_regsubsets_forward <- summary(regsubsets_forward)
sum_regsubsets_forward
Subset selection object
Call: regsubsets.formula(average_price ~ ., data = avacado2, nvmax = 10,
method = "forward")
10 Variables (and intercept)
Forced in Forced out
total_volume FALSE FALSE
x4046 FALSE FALSE
x4225 FALSE FALSE
x4770 FALSE FALSE
total_bags FALSE FALSE
small_bags FALSE FALSE
large_bags FALSE FALSE
x_large_bags FALSE FALSE
typeorganic FALSE FALSE
year FALSE FALSE
1 subsets of each size up to 10
Selection Algorithm: forward
total_volume x4046 x4225 x4770 total_bags small_bags large_bags x_large_bags
1 ( 1 ) " " " " " " " " " " " " " " " "
2 ( 1 ) " " " " " " " " " " " " " " " "
3 ( 1 ) " " "*" " " " " " " " " " " " "
4 ( 1 ) " " "*" "*" " " " " " " " " " "
5 ( 1 ) " " "*" "*" "*" " " " " " " " "
6 ( 1 ) " " "*" "*" "*" " " " " " " "*"
7 ( 1 ) " " "*" "*" "*" " " " " "*" "*"
8 ( 1 ) " " "*" "*" "*" "*" " " "*" "*"
9 ( 1 ) "*" "*" "*" "*" "*" " " "*" "*"
10 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*"
typeorganic year
1 ( 1 ) "*" " "
2 ( 1 ) "*" "*"
3 ( 1 ) "*" "*"
4 ( 1 ) "*" "*"
5 ( 1 ) "*" "*"
6 ( 1 ) "*" "*"
7 ( 1 ) "*" "*"
8 ( 1 ) "*" "*"
9 ( 1 ) "*" "*"
10 ( 1 ) "*" "*"
The best predictor model shows us the best predictors using the asterices
# plotting this shows us the adjusted r2 values and which variables are in the model. Top row shows model with highest adjusted r2
plot(regsubsets_forward, scale = "adjr2")
sum_regsubsets_forward$which
(Intercept) total_volume x4046 x4225 x4770 total_bags small_bags large_bags
1 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
4 TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
5 TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
6 TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
7 TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
8 TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE
9 TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
10 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
x_large_bags typeorganic year
1 FALSE TRUE FALSE
2 FALSE TRUE TRUE
3 FALSE TRUE TRUE
4 FALSE TRUE TRUE
5 FALSE TRUE TRUE
6 TRUE TRUE TRUE
7 TRUE TRUE TRUE
8 TRUE TRUE TRUE
9 TRUE TRUE TRUE
10 TRUE TRUE TRUE
regsubsets_backward <- regsubsets(average_price ~ ., data = avacado2, nvmax = 10, method = "backward")
# plotting this shows us the adjusted r2 values and which variables are in the model. Top row shows model with highest adjusted r2
plot(regsubsets_backward, scale = "adjr2")
regsubsets_exhaustive <- regsubsets(average_price ~ ., data = avacado2, nvmax = 10, method = "exhaustive")
# plotting this shows us the adjusted r2 values and which variables are in the model. Top row shows model with highest adjusted r2
plot(regsubsets_exhaustive, scale = "adjr2")
summary(regsubsets_exhaustive)$which[10,]
(Intercept) total_volume x4046 x4225 x4770 total_bags
TRUE TRUE TRUE TRUE TRUE TRUE
small_bags large_bags x_large_bags typeorganic year
TRUE TRUE TRUE TRUE TRUE
summary(regsubsets_backward)$which[10,]
(Intercept) total_volume x4046 x4225 x4770 total_bags
TRUE TRUE TRUE TRUE TRUE TRUE
small_bags large_bags x_large_bags typeorganic year
TRUE TRUE TRUE TRUE TRUE
summary(regsubsets_forward)$which[10,]
(Intercept) total_volume x4046 x4225 x4770 total_bags
TRUE TRUE TRUE TRUE TRUE TRUE
small_bags large_bags x_large_bags typeorganic year
TRUE TRUE TRUE TRUE TRUE
avacado2 %>%
ggplot(aes(x = average_price)) +
geom_histogram()
avacado2 %>%
ggplot(aes(x = log10(average_price))) +
geom_histogram()
CODECLAN- SOLUTION
avocados <- clean_names(read_csv("data/avocado.csv"))
Missing column names filled in: 'X1' [1]Parsed with column specification:
cols(
X1 = [32mcol_double()[39m,
Date = [34mcol_date(format = "")[39m,
AveragePrice = [32mcol_double()[39m,
`Total Volume` = [32mcol_double()[39m,
`4046` = [32mcol_double()[39m,
`4225` = [32mcol_double()[39m,
`4770` = [32mcol_double()[39m,
`Total Bags` = [32mcol_double()[39m,
`Small Bags` = [32mcol_double()[39m,
`Large Bags` = [32mcol_double()[39m,
`XLarge Bags` = [32mcol_double()[39m,
type = [31mcol_character()[39m,
year = [32mcol_double()[39m,
region = [31mcol_character()[39m
)
summary(avocados)
x1 date average_price total_volume
Min. : 0.00 Min. :2015-01-04 Min. :0.440 Min. : 85
1st Qu.:10.00 1st Qu.:2015-10-25 1st Qu.:1.100 1st Qu.: 10839
Median :24.00 Median :2016-08-14 Median :1.370 Median : 107377
Mean :24.23 Mean :2016-08-13 Mean :1.406 Mean : 850644
3rd Qu.:38.00 3rd Qu.:2017-06-04 3rd Qu.:1.660 3rd Qu.: 432962
Max. :52.00 Max. :2018-03-25 Max. :3.250 Max. :62505647
x4046 x4225 x4770 total_bags
Min. : 0 Min. : 0 Min. : 0 Min. : 0
1st Qu.: 854 1st Qu.: 3009 1st Qu.: 0 1st Qu.: 5089
Median : 8645 Median : 29061 Median : 185 Median : 39744
Mean : 293008 Mean : 295155 Mean : 22840 Mean : 239639
3rd Qu.: 111020 3rd Qu.: 150207 3rd Qu.: 6243 3rd Qu.: 110783
Max. :22743616 Max. :20470573 Max. :2546439 Max. :19373134
small_bags large_bags x_large_bags type
Min. : 0 Min. : 0 Min. : 0.0 Length:18249
1st Qu.: 2849 1st Qu.: 127 1st Qu.: 0.0 Class :character
Median : 26363 Median : 2648 Median : 0.0 Mode :character
Mean : 182195 Mean : 54338 Mean : 3106.4
3rd Qu.: 83338 3rd Qu.: 22029 3rd Qu.: 132.5
Max. :13384587 Max. :5719097 Max. :551693.7
year region
Min. :2015 Length:18249
1st Qu.:2015 Class :character
Median :2016 Mode :character
Mean :2016
3rd Qu.:2017
Max. :2018
head(avocados)
avocados %>%
distinct(region) %>%
summarise(number_of_regions = n())
avocados %>%
distinct(date) %>%
summarise(
number_of_dates = n(),
min_date = min(date),
max_date = max(date)
)
NA
library(lubridate)
package ‘lubridate’ was built under R version 3.6.2
Attaching package: ‘lubridate’
The following object is masked from ‘package:HH’:
interval
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
trimmed_avocados <- avocados %>%
mutate(
quarter = as_factor(quarter(date)),
year = as_factor(year),
type = as_factor(type)
) %>%
dplyr::select(-c("x1", "date"))
alias(average_price ~ ., data = trimmed_avocados )
Model :
average_price ~ total_volume + x4046 + x4225 + x4770 + total_bags +
small_bags + large_bags + x_large_bags + type + year + region +
quarter
trimmed_avocados %>%
dplyr::select(-region) %>%
ggpairs()
ggsave("pairs_plot_choice1.png", width = 10, height = 10, units = "in")
trimmed_avocados %>%
ggplot(aes(x = region, y = average_price)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Test competing models with x4046, type, year, quarter and region:
model1a <- lm(average_price ~ x4046, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1a)
summary(model1a)
Call:
lm(formula = average_price ~ x4046, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-0.98539 -0.29842 -0.03531 0.25459 1.82475
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.425e+00 2.993e-03 476.29 <2e-16 ***
x4046 -6.631e-08 2.305e-09 -28.77 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3939 on 18247 degrees of freedom
Multiple R-squared: 0.0434, Adjusted R-squared: 0.04334
F-statistic: 827.8 on 1 and 18247 DF, p-value: < 2.2e-16
model1b <- lm(average_price ~ type, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1b)
summary(model1b)
Call:
lm(formula = average_price ~ type, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.21400 -0.20400 -0.02804 0.18600 1.59600
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.158040 0.003321 348.7 <2e-16 ***
typeorganic 0.495959 0.004697 105.6 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3173 on 18247 degrees of freedom
Multiple R-squared: 0.3793, Adjusted R-squared: 0.3792
F-statistic: 1.115e+04 on 1 and 18247 DF, p-value: < 2.2e-16
model1c <- lm(average_price ~ year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1c)
summary(model1c)
Call:
lm(formula = average_price ~ year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.07513 -0.29513 -0.03559 0.25247 1.91136
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.375590 0.005280 260.546 < 2e-16 ***
year2016 -0.036951 0.007466 -4.949 7.52e-07 ***
year2017 0.139537 0.007432 18.776 < 2e-16 ***
year2018 -0.028060 0.012192 -2.301 0.0214 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3956 on 18245 degrees of freedom
Multiple R-squared: 0.03489, Adjusted R-squared: 0.03474
F-statistic: 219.9 on 3 and 18245 DF, p-value: < 2.2e-16
model1d <- lm(average_price ~ quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1d)
summary(model1d)
Call:
lm(formula = average_price ~ quarter, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-0.96859 -0.30503 -0.02859 0.25497 1.79497
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.306605 0.005316 245.769 <2e-16 ***
quarter2 0.068428 0.008077 8.472 <2e-16 ***
quarter3 0.206308 0.008076 25.545 <2e-16 ***
quarter4 0.151983 0.008019 18.952 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3946 on 18245 degrees of freedom
Multiple R-squared: 0.04006, Adjusted R-squared: 0.03991
F-statistic: 253.8 on 3 and 18245 DF, p-value: < 2.2e-16
model1e <- lm(average_price ~ region, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1e)
summary(model1e)
Call:
lm(formula = average_price ~ region, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-0.97095 -0.28423 -0.03432 0.25207 1.76115
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.561036 0.020006 78.029 < 2e-16 ***
regionAtlanta -0.223077 0.028293 -7.885 3.33e-15 ***
regionBaltimoreWashington -0.026805 0.028293 -0.947 0.34344
regionBoise -0.212899 0.028293 -7.525 5.52e-14 ***
regionBoston -0.030148 0.028293 -1.066 0.28663
regionBuffaloRochester -0.044201 0.028293 -1.562 0.11824
regionCalifornia -0.165710 0.028293 -5.857 4.79e-09 ***
regionCharlotte 0.045000 0.028293 1.591 0.11173
regionChicago -0.004260 0.028293 -0.151 0.88031
regionCincinnatiDayton -0.351834 0.028293 -12.436 < 2e-16 ***
regionColumbus -0.308254 0.028293 -10.895 < 2e-16 ***
regionDallasFtWorth -0.475444 0.028293 -16.805 < 2e-16 ***
regionDenver -0.342456 0.028293 -12.104 < 2e-16 ***
regionDetroit -0.284941 0.028293 -10.071 < 2e-16 ***
regionGrandRapids -0.056036 0.028293 -1.981 0.04765 *
regionGreatLakes -0.222485 0.028293 -7.864 3.94e-15 ***
regionHarrisburgScranton -0.047751 0.028293 -1.688 0.09147 .
regionHartfordSpringfield 0.257604 0.028293 9.105 < 2e-16 ***
regionHouston -0.513107 0.028293 -18.136 < 2e-16 ***
regionIndianapolis -0.247041 0.028293 -8.732 < 2e-16 ***
regionJacksonville -0.050089 0.028293 -1.770 0.07668 .
regionLasVegas -0.180118 0.028293 -6.366 1.98e-10 ***
regionLosAngeles -0.345030 0.028293 -12.195 < 2e-16 ***
regionLouisville -0.274349 0.028293 -9.697 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.028293 -4.685 2.82e-06 ***
regionMidsouth -0.156272 0.028293 -5.523 3.37e-08 ***
regionNashville -0.348935 0.028293 -12.333 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.028293 -9.057 < 2e-16 ***
regionNewYork 0.166538 0.028293 5.886 4.02e-09 ***
regionNortheast 0.040888 0.028293 1.445 0.14843
regionNorthernNewEngland -0.083639 0.028293 -2.956 0.00312 **
regionOrlando -0.054822 0.028293 -1.938 0.05268 .
regionPhiladelphia 0.071095 0.028293 2.513 0.01199 *
regionPhoenixTucson -0.336598 0.028293 -11.897 < 2e-16 ***
regionPittsburgh -0.196716 0.028293 -6.953 3.70e-12 ***
regionPlains -0.124527 0.028293 -4.401 1.08e-05 ***
regionPortland -0.243314 0.028293 -8.600 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.028293 -0.209 0.83434
regionRichmondNorfolk -0.269704 0.028293 -9.533 < 2e-16 ***
regionRoanoke -0.313107 0.028293 -11.067 < 2e-16 ***
regionSacramento 0.060533 0.028293 2.140 0.03241 *
regionSanDiego -0.162870 0.028293 -5.757 8.72e-09 ***
regionSanFrancisco 0.243166 0.028293 8.595 < 2e-16 ***
regionSeattle -0.118462 0.028293 -4.187 2.84e-05 ***
regionSouthCarolina -0.157751 0.028293 -5.576 2.50e-08 ***
regionSouthCentral -0.459793 0.028293 -16.251 < 2e-16 ***
regionSoutheast -0.163018 0.028293 -5.762 8.45e-09 ***
regionSpokane -0.115444 0.028293 -4.080 4.52e-05 ***
regionStLouis -0.130414 0.028293 -4.609 4.06e-06 ***
regionSyracuse -0.040710 0.028293 -1.439 0.15020
regionTampa -0.152189 0.028293 -5.379 7.58e-08 ***
regionTotalUS -0.242012 0.028293 -8.554 < 2e-16 ***
regionWest -0.288817 0.028293 -10.208 < 2e-16 ***
regionWestTexNewMexico -0.299334 0.028356 -10.556 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3678 on 18195 degrees of freedom
Multiple R-squared: 0.1681, Adjusted R-squared: 0.1657
F-statistic: 69.38 on 53 and 18195 DF, p-value: < 2.2e-16
model1b with type is best, so we’ll keep that and re-run ggpairs() with the residuals (again omitting region).
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model1b) %>%
dplyr::select(-c("average_price", "type", "region"))
ggpairs(avocados_remaining_resid)
ggsave("pairs_plot_choice2.png", width = 10, height = 10, units = "in")
trimmed_avocados %>%
add_residuals(model1b) %>%
ggplot(aes(x = region, y = resid)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Looks like x4046, year, quarter and region are our next strong contenders:
model2a <- lm(average_price ~ type + x4046, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2a)
summary(model2a)
Call:
lm(formula = average_price ~ type + x4046, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.21416 -0.20029 -0.02736 0.18591 1.59589
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.171e+00 3.485e-03 336.13 <2e-16 ***
typeorganic 4.827e-01 4.802e-03 100.52 <2e-16 ***
x4046 -2.323e-08 1.898e-09 -12.24 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.316 on 18246 degrees of freedom
Multiple R-squared: 0.3843, Adjusted R-squared: 0.3843
F-statistic: 5695 on 2 and 18246 DF, p-value: < 2.2e-16
model2b <- lm(average_price ~ type + year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2b)
summary(model2b)
Call:
lm(formula = average_price ~ type + year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.32320 -0.18722 -0.01722 0.18278 1.66337
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.127645 0.004704 239.735 < 2e-16 ***
typeorganic 0.495980 0.004563 108.685 < 2e-16 ***
year2016 -0.036995 0.005817 -6.360 2.07e-10 ***
year2017 0.139580 0.005790 24.107 < 2e-16 ***
year2018 -0.028104 0.009499 -2.959 0.00309 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3082 on 18244 degrees of freedom
Multiple R-squared: 0.4142, Adjusted R-squared: 0.4141
F-statistic: 3225 on 4 and 18244 DF, p-value: < 2.2e-16
model2c <- lm(average_price ~ type + quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2c)
summary(model2c)
Call:
lm(formula = average_price ~ type + quarter, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.11458 -0.20089 -0.02458 0.18542 1.54687
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.058626 0.004718 224.38 <2e-16 ***
typeorganic 0.495958 0.004543 109.16 <2e-16 ***
quarter2 0.068546 0.006282 10.91 <2e-16 ***
quarter3 0.206308 0.006281 32.84 <2e-16 ***
quarter4 0.152040 0.006237 24.38 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3069 on 18244 degrees of freedom
Multiple R-squared: 0.4193, Adjusted R-squared: 0.4192
F-statistic: 3294 on 4 and 18244 DF, p-value: < 2.2e-16
model2d <- lm(average_price ~ type + region, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2d)
summary(model2d)
Call:
lm(formula = average_price ~ type + region, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.09858 -0.16716 -0.01814 0.14692 1.51320
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.313079 0.014894 88.159 < 2e-16 ***
typeorganic 0.495912 0.004017 123.452 < 2e-16 ***
regionAtlanta -0.223077 0.020871 -10.688 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.020871 -1.284 0.19906
regionBoise -0.212899 0.020871 -10.201 < 2e-16 ***
regionBoston -0.030148 0.020871 -1.444 0.14863
regionBuffaloRochester -0.044201 0.020871 -2.118 0.03421 *
regionCalifornia -0.165710 0.020871 -7.940 2.15e-15 ***
regionCharlotte 0.045000 0.020871 2.156 0.03109 *
regionChicago -0.004260 0.020871 -0.204 0.83826
regionCincinnatiDayton -0.351834 0.020871 -16.857 < 2e-16 ***
regionColumbus -0.308254 0.020871 -14.769 < 2e-16 ***
regionDallasFtWorth -0.475444 0.020871 -22.780 < 2e-16 ***
regionDenver -0.342456 0.020871 -16.408 < 2e-16 ***
regionDetroit -0.284941 0.020871 -13.652 < 2e-16 ***
regionGrandRapids -0.056036 0.020871 -2.685 0.00726 **
regionGreatLakes -0.222485 0.020871 -10.660 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.020871 -2.288 0.02216 *
regionHartfordSpringfield 0.257604 0.020871 12.342 < 2e-16 ***
regionHouston -0.513107 0.020871 -24.584 < 2e-16 ***
regionIndianapolis -0.247041 0.020871 -11.836 < 2e-16 ***
regionJacksonville -0.050089 0.020871 -2.400 0.01641 *
regionLasVegas -0.180118 0.020871 -8.630 < 2e-16 ***
regionLosAngeles -0.345030 0.020871 -16.531 < 2e-16 ***
regionLouisville -0.274349 0.020871 -13.145 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.020871 -6.351 2.20e-10 ***
regionMidsouth -0.156272 0.020871 -7.487 7.35e-14 ***
regionNashville -0.348935 0.020871 -16.718 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.020871 -12.277 < 2e-16 ***
regionNewYork 0.166538 0.020871 7.979 1.56e-15 ***
regionNortheast 0.040888 0.020871 1.959 0.05013 .
regionNorthernNewEngland -0.083639 0.020871 -4.007 6.16e-05 ***
regionOrlando -0.054822 0.020871 -2.627 0.00863 **
regionPhiladelphia 0.071095 0.020871 3.406 0.00066 ***
regionPhoenixTucson -0.336598 0.020871 -16.127 < 2e-16 ***
regionPittsburgh -0.196716 0.020871 -9.425 < 2e-16 ***
regionPlains -0.124527 0.020871 -5.966 2.47e-09 ***
regionPortland -0.243314 0.020871 -11.658 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.020871 -0.284 0.77679
regionRichmondNorfolk -0.269704 0.020871 -12.922 < 2e-16 ***
regionRoanoke -0.313107 0.020871 -15.002 < 2e-16 ***
regionSacramento 0.060533 0.020871 2.900 0.00373 **
regionSanDiego -0.162870 0.020871 -7.803 6.35e-15 ***
regionSanFrancisco 0.243166 0.020871 11.651 < 2e-16 ***
regionSeattle -0.118462 0.020871 -5.676 1.40e-08 ***
regionSouthCarolina -0.157751 0.020871 -7.558 4.28e-14 ***
regionSouthCentral -0.459793 0.020871 -22.030 < 2e-16 ***
regionSoutheast -0.163018 0.020871 -7.811 6.00e-15 ***
regionSpokane -0.115444 0.020871 -5.531 3.22e-08 ***
regionStLouis -0.130414 0.020871 -6.248 4.24e-10 ***
regionSyracuse -0.040710 0.020871 -1.951 0.05113 .
regionTampa -0.152189 0.020871 -7.292 3.18e-13 ***
regionTotalUS -0.242012 0.020871 -11.595 < 2e-16 ***
regionWest -0.288817 0.020871 -13.838 < 2e-16 ***
regionWestTexNewMexico -0.297114 0.020918 -14.204 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2713 on 18194 degrees of freedom
Multiple R-squared: 0.5473, Adjusted R-squared: 0.546
F-statistic: 407.4 on 54 and 18194 DF, p-value: < 2.2e-16
So model2d with type and region comes out as better here. We have some region coefficients that are not significant at 0.05 level, so let’s run an anova() to test whether to include region
anova(model1b, model2d)
Analysis of Variance Table
Model 1: average_price ~ type
Model 2: average_price ~ type + region
Res.Df RSS Df Sum of Sq F Pr(>F)
1 18247 1836.7
2 18194 1339.4 53 497.26 127.44 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
It seems region is significant overall, so we’ll keep it in!
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model2d) %>%
dplyr::select(-c("average_price", "type", "region"))
ggpairs(avocados_remaining_resid)
ggsave("pairs_plot_choice3.png", width = 10, height = 10, units = "in")
model3a <- lm(average_price ~ type + region + x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model3a)
summary(model3a)
Call:
lm(formula = average_price ~ type + region + x_large_bags, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.10024 -0.16726 -0.01734 0.14591 1.51156
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.311e+00 1.489e-02 88.033 < 2e-16 ***
typeorganic 5.001e-01 4.101e-03 121.953 < 2e-16 ***
regionAtlanta -2.235e-01 2.086e-02 -10.718 < 2e-16 ***
regionBaltimoreWashington -2.713e-02 2.086e-02 -1.301 0.193298
regionBoise -2.128e-01 2.086e-02 -10.204 < 2e-16 ***
regionBoston -3.023e-02 2.086e-02 -1.449 0.147234
regionBuffaloRochester -4.428e-02 2.086e-02 -2.123 0.033774 *
regionCalifornia -1.762e-01 2.096e-02 -8.408 < 2e-16 ***
regionCharlotte 4.495e-02 2.086e-02 2.155 0.031177 *
regionChicago -4.936e-03 2.086e-02 -0.237 0.812924
regionCincinnatiDayton -3.523e-01 2.086e-02 -16.890 < 2e-16 ***
regionColumbus -3.086e-01 2.086e-02 -14.796 < 2e-16 ***
regionDallasFtWorth -4.762e-01 2.086e-02 -22.832 < 2e-16 ***
regionDenver -3.425e-01 2.086e-02 -16.420 < 2e-16 ***
regionDetroit -2.882e-01 2.087e-02 -13.810 < 2e-16 ***
regionGrandRapids -5.764e-02 2.086e-02 -2.763 0.005731 **
regionGreatLakes -2.353e-01 2.101e-02 -11.198 < 2e-16 ***
regionHarrisburgScranton -4.798e-02 2.086e-02 -2.300 0.021451 *
regionHartfordSpringfield 2.575e-01 2.086e-02 12.347 < 2e-16 ***
regionHouston -5.137e-01 2.086e-02 -24.628 < 2e-16 ***
regionIndianapolis -2.475e-01 2.086e-02 -11.867 < 2e-16 ***
regionJacksonville -5.021e-02 2.086e-02 -2.407 0.016074 *
regionLasVegas -1.801e-01 2.086e-02 -8.633 < 2e-16 ***
regionLosAngeles -3.532e-01 2.092e-02 -16.881 < 2e-16 ***
regionLouisville -2.745e-01 2.086e-02 -13.160 < 2e-16 ***
regionMiamiFtLauderdale -1.331e-01 2.086e-02 -6.380 1.81e-10 ***
regionMidsouth -1.590e-01 2.086e-02 -7.619 2.68e-14 ***
regionNashville -3.491e-01 2.086e-02 -16.736 < 2e-16 ***
regionNewOrleansMobile -2.572e-01 2.086e-02 -12.330 < 2e-16 ***
regionNewYork 1.659e-01 2.086e-02 7.954 1.91e-15 ***
regionNortheast 3.834e-02 2.086e-02 1.838 0.066151 .
regionNorthernNewEngland -8.377e-02 2.086e-02 -4.017 5.93e-05 ***
regionOrlando -5.523e-02 2.086e-02 -2.648 0.008111 **
regionPhiladelphia 7.097e-02 2.086e-02 3.403 0.000669 ***
regionPhoenixTucson -3.368e-01 2.086e-02 -16.149 < 2e-16 ***
regionPittsburgh -1.967e-01 2.086e-02 -9.433 < 2e-16 ***
regionPlains -1.267e-01 2.086e-02 -6.072 1.29e-09 ***
regionPortland -2.434e-01 2.086e-02 -11.669 < 2e-16 ***
regionRaleighGreensboro -6.021e-03 2.086e-02 -0.289 0.772828
regionRichmondNorfolk -2.699e-01 2.086e-02 -12.939 < 2e-16 ***
regionRoanoke -3.132e-01 2.086e-02 -15.015 < 2e-16 ***
regionSacramento 6.020e-02 2.086e-02 2.886 0.003904 **
regionSanDiego -1.631e-01 2.086e-02 -7.819 5.64e-15 ***
regionSanFrancisco 2.428e-01 2.086e-02 11.642 < 2e-16 ***
regionSeattle -1.185e-01 2.086e-02 -5.682 1.35e-08 ***
regionSouthCarolina -1.581e-01 2.086e-02 -7.581 3.59e-14 ***
regionSouthCentral -4.650e-01 2.088e-02 -22.268 < 2e-16 ***
regionSoutheast -1.680e-01 2.088e-02 -8.046 9.10e-16 ***
regionSpokane -1.154e-01 2.086e-02 -5.531 3.22e-08 ***
regionStLouis -1.308e-01 2.086e-02 -6.270 3.69e-10 ***
regionSyracuse -4.071e-02 2.086e-02 -1.952 0.050993 .
regionTampa -1.526e-01 2.086e-02 -7.315 2.68e-13 ***
regionTotalUS -2.852e-01 2.255e-02 -12.648 < 2e-16 ***
regionWest -2.904e-01 2.086e-02 -13.922 < 2e-16 ***
regionWestTexNewMexico -2.976e-01 2.090e-02 -14.238 < 2e-16 ***
x_large_bags 6.810e-07 1.351e-07 5.040 4.70e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2711 on 18193 degrees of freedom
Multiple R-squared: 0.548, Adjusted R-squared: 0.5466
F-statistic: 401 on 55 and 18193 DF, p-value: < 2.2e-16
model3b <- lm(average_price ~ type + region + year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model3b)
summary(model3b)
Call:
lm(formula = average_price ~ type + region + year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.1532 -0.1497 -0.0060 0.1419 1.4849
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.282672 0.014600 87.857 < 2e-16 ***
typeorganic 0.495933 0.003859 128.501 < 2e-16 ***
regionAtlanta -0.223077 0.020052 -11.125 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.020052 -1.337 0.181322
regionBoise -0.212899 0.020052 -10.617 < 2e-16 ***
regionBoston -0.030148 0.020052 -1.503 0.132735
regionBuffaloRochester -0.044201 0.020052 -2.204 0.027515 *
regionCalifornia -0.165710 0.020052 -8.264 < 2e-16 ***
regionCharlotte 0.045000 0.020052 2.244 0.024835 *
regionChicago -0.004260 0.020052 -0.212 0.831748
regionCincinnatiDayton -0.351834 0.020052 -17.546 < 2e-16 ***
regionColumbus -0.308254 0.020052 -15.373 < 2e-16 ***
regionDallasFtWorth -0.475444 0.020052 -23.710 < 2e-16 ***
regionDenver -0.342456 0.020052 -17.078 < 2e-16 ***
regionDetroit -0.284941 0.020052 -14.210 < 2e-16 ***
regionGrandRapids -0.056036 0.020052 -2.794 0.005204 **
regionGreatLakes -0.222485 0.020052 -11.095 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.020052 -2.381 0.017259 *
regionHartfordSpringfield 0.257604 0.020052 12.847 < 2e-16 ***
regionHouston -0.513107 0.020052 -25.589 < 2e-16 ***
regionIndianapolis -0.247041 0.020052 -12.320 < 2e-16 ***
regionJacksonville -0.050089 0.020052 -2.498 0.012501 *
regionLasVegas -0.180118 0.020052 -8.982 < 2e-16 ***
regionLosAngeles -0.345030 0.020052 -17.207 < 2e-16 ***
regionLouisville -0.274349 0.020052 -13.682 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.020052 -6.610 3.95e-11 ***
regionMidsouth -0.156272 0.020052 -7.793 6.88e-15 ***
regionNashville -0.348935 0.020052 -17.401 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.020052 -12.779 < 2e-16 ***
regionNewYork 0.166538 0.020052 8.305 < 2e-16 ***
regionNortheast 0.040888 0.020052 2.039 0.041459 *
regionNorthernNewEngland -0.083639 0.020052 -4.171 3.05e-05 ***
regionOrlando -0.054822 0.020052 -2.734 0.006263 **
regionPhiladelphia 0.071095 0.020052 3.545 0.000393 ***
regionPhoenixTucson -0.336598 0.020052 -16.786 < 2e-16 ***
regionPittsburgh -0.196716 0.020052 -9.810 < 2e-16 ***
regionPlains -0.124527 0.020052 -6.210 5.41e-10 ***
regionPortland -0.243314 0.020052 -12.134 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.020052 -0.295 0.767930
regionRichmondNorfolk -0.269704 0.020052 -13.450 < 2e-16 ***
regionRoanoke -0.313107 0.020052 -15.615 < 2e-16 ***
regionSacramento 0.060533 0.020052 3.019 0.002542 **
regionSanDiego -0.162870 0.020052 -8.122 4.86e-16 ***
regionSanFrancisco 0.243166 0.020052 12.127 < 2e-16 ***
regionSeattle -0.118462 0.020052 -5.908 3.53e-09 ***
regionSouthCarolina -0.157751 0.020052 -7.867 3.83e-15 ***
regionSouthCentral -0.459793 0.020052 -22.930 < 2e-16 ***
regionSoutheast -0.163018 0.020052 -8.130 4.58e-16 ***
regionSpokane -0.115444 0.020052 -5.757 8.69e-09 ***
regionStLouis -0.130414 0.020052 -6.504 8.04e-11 ***
regionSyracuse -0.040710 0.020052 -2.030 0.042350 *
regionTampa -0.152189 0.020052 -7.590 3.36e-14 ***
regionTotalUS -0.242012 0.020052 -12.069 < 2e-16 ***
regionWest -0.288817 0.020052 -14.403 < 2e-16 ***
regionWestTexNewMexico -0.296552 0.020097 -14.756 < 2e-16 ***
year2016 -0.036970 0.004920 -7.515 5.96e-14 ***
year2017 0.139555 0.004897 28.500 < 2e-16 ***
year2018 -0.028078 0.008033 -3.495 0.000475 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2607 on 18191 degrees of freedom
Multiple R-squared: 0.5822, Adjusted R-squared: 0.5809
F-statistic: 444.8 on 57 and 18191 DF, p-value: < 2.2e-16
model3c <- lm(average_price ~ type + region + quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model3c)
summary(model3c)
Call:
lm(formula = average_price ~ type + region + quarter, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.06767 -0.15971 -0.01185 0.14629 1.54411
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.213689 0.014517 83.603 < 2e-16 ***
typeorganic 0.495911 0.003835 129.296 < 2e-16 ***
regionAtlanta -0.223077 0.019928 -11.194 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.019928 -1.345 0.178619
regionBoise -0.212899 0.019928 -10.683 < 2e-16 ***
regionBoston -0.030148 0.019928 -1.513 0.130339
regionBuffaloRochester -0.044201 0.019928 -2.218 0.026565 *
regionCalifornia -0.165710 0.019928 -8.315 < 2e-16 ***
regionCharlotte 0.045000 0.019928 2.258 0.023950 *
regionChicago -0.004260 0.019928 -0.214 0.830716
regionCincinnatiDayton -0.351834 0.019928 -17.655 < 2e-16 ***
regionColumbus -0.308254 0.019928 -15.468 < 2e-16 ***
regionDallasFtWorth -0.475444 0.019928 -23.858 < 2e-16 ***
regionDenver -0.342456 0.019928 -17.185 < 2e-16 ***
regionDetroit -0.284941 0.019928 -14.298 < 2e-16 ***
regionGrandRapids -0.056036 0.019928 -2.812 0.004931 **
regionGreatLakes -0.222485 0.019928 -11.164 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.019928 -2.396 0.016577 *
regionHartfordSpringfield 0.257604 0.019928 12.927 < 2e-16 ***
regionHouston -0.513107 0.019928 -25.748 < 2e-16 ***
regionIndianapolis -0.247041 0.019928 -12.397 < 2e-16 ***
regionJacksonville -0.050089 0.019928 -2.513 0.011963 *
regionLasVegas -0.180118 0.019928 -9.038 < 2e-16 ***
regionLosAngeles -0.345030 0.019928 -17.314 < 2e-16 ***
regionLouisville -0.274349 0.019928 -13.767 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.019928 -6.651 2.99e-11 ***
regionMidsouth -0.156272 0.019928 -7.842 4.69e-15 ***
regionNashville -0.348935 0.019928 -17.510 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.019928 -12.858 < 2e-16 ***
regionNewYork 0.166538 0.019928 8.357 < 2e-16 ***
regionNortheast 0.040888 0.019928 2.052 0.040208 *
regionNorthernNewEngland -0.083639 0.019928 -4.197 2.72e-05 ***
regionOrlando -0.054822 0.019928 -2.751 0.005947 **
regionPhiladelphia 0.071095 0.019928 3.568 0.000361 ***
regionPhoenixTucson -0.336598 0.019928 -16.891 < 2e-16 ***
regionPittsburgh -0.196716 0.019928 -9.871 < 2e-16 ***
regionPlains -0.124527 0.019928 -6.249 4.23e-10 ***
regionPortland -0.243314 0.019928 -12.210 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.019928 -0.297 0.766527
regionRichmondNorfolk -0.269704 0.019928 -13.534 < 2e-16 ***
regionRoanoke -0.313107 0.019928 -15.712 < 2e-16 ***
regionSacramento 0.060533 0.019928 3.038 0.002389 **
regionSanDiego -0.162870 0.019928 -8.173 3.21e-16 ***
regionSanFrancisco 0.243166 0.019928 12.202 < 2e-16 ***
regionSeattle -0.118462 0.019928 -5.944 2.82e-09 ***
regionSouthCarolina -0.157751 0.019928 -7.916 2.59e-15 ***
regionSouthCentral -0.459793 0.019928 -23.073 < 2e-16 ***
regionSoutheast -0.163018 0.019928 -8.180 3.02e-16 ***
regionSpokane -0.115444 0.019928 -5.793 7.03e-09 ***
regionStLouis -0.130414 0.019928 -6.544 6.14e-11 ***
regionSyracuse -0.040710 0.019928 -2.043 0.041082 *
regionTampa -0.152189 0.019928 -7.637 2.33e-14 ***
regionTotalUS -0.242012 0.019928 -12.144 < 2e-16 ***
regionWest -0.288817 0.019928 -14.493 < 2e-16 ***
regionWestTexNewMexico -0.297141 0.019973 -14.877 < 2e-16 ***
quarter2 0.068479 0.005303 12.912 < 2e-16 ***
quarter3 0.206308 0.005303 38.906 < 2e-16 ***
quarter4 0.152007 0.005265 28.869 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2591 on 18191 degrees of freedom
Multiple R-squared: 0.5874, Adjusted R-squared: 0.5861
F-statistic: 454.3 on 57 and 18191 DF, p-value: < 2.2e-16
So model3c with type, region and quarter wins out here. Everything still looks reasonable with the diagnostics, perhaps some mild heteroscedasticity.
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model3c) %>%
dplyr::select(-c("average_price", "type", "region", "quarter"))
ggpairs(avocados_remaining_resid)
ggsave("pairs_plot_choice4.png", width = 10, height = 10, units = "in")
model4a <- lm(average_price ~ type + region + quarter + x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model4a)
summary(model4a)
Call:
lm(formula = average_price ~ type + region + quarter + x_large_bags,
data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.06889 -0.16013 -0.01154 0.14553 1.54291
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.212e+00 1.451e-02 83.493 < 2e-16 ***
typeorganic 4.998e-01 3.916e-03 127.614 < 2e-16 ***
regionAtlanta -2.235e-01 1.992e-02 -11.222 < 2e-16 ***
regionBaltimoreWashington -2.711e-02 1.992e-02 -1.361 0.173535
regionBoise -2.128e-01 1.992e-02 -10.687 < 2e-16 ***
regionBoston -3.022e-02 1.992e-02 -1.518 0.129137
regionBuffaloRochester -4.427e-02 1.992e-02 -2.223 0.026233 *
regionCalifornia -1.753e-01 2.002e-02 -8.759 < 2e-16 ***
regionCharlotte 4.495e-02 1.992e-02 2.257 0.024015 *
regionChicago -4.877e-03 1.992e-02 -0.245 0.806549
regionCincinnatiDayton -3.522e-01 1.992e-02 -17.686 < 2e-16 ***
regionColumbus -3.086e-01 1.992e-02 -15.494 < 2e-16 ***
regionDallasFtWorth -4.762e-01 1.992e-02 -23.908 < 2e-16 ***
regionDenver -3.425e-01 1.992e-02 -17.196 < 2e-16 ***
regionDetroit -2.879e-01 1.993e-02 -14.449 < 2e-16 ***
regionGrandRapids -5.750e-02 1.992e-02 -2.887 0.003898 **
regionGreatLakes -2.342e-01 2.006e-02 -11.671 < 2e-16 ***
regionHarrisburgScranton -4.796e-02 1.992e-02 -2.408 0.016054 *
regionHartfordSpringfield 2.575e-01 1.992e-02 12.931 < 2e-16 ***
regionHouston -5.136e-01 1.992e-02 -25.789 < 2e-16 ***
regionIndianapolis -2.475e-01 1.992e-02 -12.426 < 2e-16 ***
regionJacksonville -5.020e-02 1.992e-02 -2.521 0.011720 *
regionLasVegas -1.801e-01 1.992e-02 -9.041 < 2e-16 ***
regionLosAngeles -3.524e-01 1.998e-02 -17.644 < 2e-16 ***
regionLouisville -2.745e-01 1.992e-02 -13.781 < 2e-16 ***
regionMiamiFtLauderdale -1.330e-01 1.992e-02 -6.679 2.47e-11 ***
regionMidsouth -1.587e-01 1.992e-02 -7.967 1.72e-15 ***
regionNashville -3.491e-01 1.992e-02 -17.527 < 2e-16 ***
regionNewOrleansMobile -2.571e-01 1.992e-02 -12.909 < 2e-16 ***
regionNewYork 1.660e-01 1.992e-02 8.333 < 2e-16 ***
regionNortheast 3.856e-02 1.992e-02 1.936 0.052939 .
regionNorthernNewEngland -8.376e-02 1.992e-02 -4.206 2.61e-05 ***
regionOrlando -5.519e-02 1.992e-02 -2.771 0.005592 **
regionPhiladelphia 7.098e-02 1.992e-02 3.564 0.000366 ***
regionPhoenixTucson -3.368e-01 1.992e-02 -16.911 < 2e-16 ***
regionPittsburgh -1.967e-01 1.992e-02 -9.879 < 2e-16 ***
regionPlains -1.265e-01 1.992e-02 -6.350 2.20e-10 ***
regionPortland -2.434e-01 1.992e-02 -12.220 < 2e-16 ***
regionRaleighGreensboro -6.012e-03 1.992e-02 -0.302 0.762753
regionRichmondNorfolk -2.699e-01 1.992e-02 -13.549 < 2e-16 ***
regionRoanoke -3.132e-01 1.992e-02 -15.725 < 2e-16 ***
regionSacramento 6.023e-02 1.992e-02 3.024 0.002497 **
regionSanDiego -1.631e-01 1.992e-02 -8.187 2.85e-16 ***
regionSanFrancisco 2.429e-01 1.992e-02 12.194 < 2e-16 ***
regionSeattle -1.185e-01 1.992e-02 -5.950 2.72e-09 ***
regionSouthCarolina -1.581e-01 1.992e-02 -7.938 2.18e-15 ***
regionSouthCentral -4.646e-01 1.994e-02 -23.297 < 2e-16 ***
regionSoutheast -1.676e-01 1.994e-02 -8.404 < 2e-16 ***
regionSpokane -1.154e-01 1.992e-02 -5.793 7.02e-09 ***
regionStLouis -1.307e-01 1.992e-02 -6.565 5.35e-11 ***
regionSyracuse -4.071e-02 1.992e-02 -2.044 0.040974 *
regionTampa -1.525e-01 1.992e-02 -7.659 1.96e-14 ***
regionTotalUS -2.814e-01 2.153e-02 -13.068 < 2e-16 ***
regionWest -2.903e-01 1.992e-02 -14.573 < 2e-16 ***
regionWestTexNewMexico -2.976e-01 1.996e-02 -14.910 < 2e-16 ***
quarter2 6.806e-02 5.301e-03 12.839 < 2e-16 ***
quarter3 2.055e-01 5.302e-03 38.761 < 2e-16 ***
quarter4 1.527e-01 5.264e-03 29.001 < 2e-16 ***
x_large_bags 6.215e-07 1.292e-07 4.810 1.52e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2589 on 18190 degrees of freedom
Multiple R-squared: 0.5879, Adjusted R-squared: 0.5866
F-statistic: 447.4 on 58 and 18190 DF, p-value: < 2.2e-16
model4b <- lm(average_price ~ type + region + quarter + year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model4b)
summary(model4b)
Call:
lm(formula = average_price ~ type + region + quarter + year,
data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.03683 -0.14588 -0.00412 0.14386 1.43930
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.167184 0.014290 81.677 < 2e-16 ***
typeorganic 0.495930 0.003675 134.950 < 2e-16 ***
regionAtlanta -0.223077 0.019094 -11.683 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.019094 -1.404 0.160383
regionBoise -0.212899 0.019094 -11.150 < 2e-16 ***
regionBoston -0.030148 0.019094 -1.579 0.114368
regionBuffaloRochester -0.044201 0.019094 -2.315 0.020627 *
regionCalifornia -0.165710 0.019094 -8.679 < 2e-16 ***
regionCharlotte 0.045000 0.019094 2.357 0.018445 *
regionChicago -0.004260 0.019094 -0.223 0.823439
regionCincinnatiDayton -0.351834 0.019094 -18.427 < 2e-16 ***
regionColumbus -0.308254 0.019094 -16.144 < 2e-16 ***
regionDallasFtWorth -0.475444 0.019094 -24.900 < 2e-16 ***
regionDenver -0.342456 0.019094 -17.935 < 2e-16 ***
regionDetroit -0.284941 0.019094 -14.923 < 2e-16 ***
regionGrandRapids -0.056036 0.019094 -2.935 0.003342 **
regionGreatLakes -0.222485 0.019094 -11.652 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.019094 -2.501 0.012397 *
regionHartfordSpringfield 0.257604 0.019094 13.491 < 2e-16 ***
regionHouston -0.513107 0.019094 -26.873 < 2e-16 ***
regionIndianapolis -0.247041 0.019094 -12.938 < 2e-16 ***
regionJacksonville -0.050089 0.019094 -2.623 0.008716 **
regionLasVegas -0.180118 0.019094 -9.433 < 2e-16 ***
regionLosAngeles -0.345030 0.019094 -18.070 < 2e-16 ***
regionLouisville -0.274349 0.019094 -14.368 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.019094 -6.942 4.00e-12 ***
regionMidsouth -0.156272 0.019094 -8.184 2.91e-16 ***
regionNashville -0.348935 0.019094 -18.275 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.019094 -13.420 < 2e-16 ***
regionNewYork 0.166538 0.019094 8.722 < 2e-16 ***
regionNortheast 0.040888 0.019094 2.141 0.032255 *
regionNorthernNewEngland -0.083639 0.019094 -4.380 1.19e-05 ***
regionOrlando -0.054822 0.019094 -2.871 0.004094 **
regionPhiladelphia 0.071095 0.019094 3.723 0.000197 ***
regionPhoenixTucson -0.336598 0.019094 -17.629 < 2e-16 ***
regionPittsburgh -0.196716 0.019094 -10.303 < 2e-16 ***
regionPlains -0.124527 0.019094 -6.522 7.13e-11 ***
regionPortland -0.243314 0.019094 -12.743 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.019094 -0.310 0.756641
regionRichmondNorfolk -0.269704 0.019094 -14.125 < 2e-16 ***
regionRoanoke -0.313107 0.019094 -16.398 < 2e-16 ***
regionSacramento 0.060533 0.019094 3.170 0.001526 **
regionSanDiego -0.162870 0.019094 -8.530 < 2e-16 ***
regionSanFrancisco 0.243166 0.019094 12.735 < 2e-16 ***
regionSeattle -0.118462 0.019094 -6.204 5.62e-10 ***
regionSouthCarolina -0.157751 0.019094 -8.262 < 2e-16 ***
regionSouthCentral -0.459793 0.019094 -24.081 < 2e-16 ***
regionSoutheast -0.163018 0.019094 -8.538 < 2e-16 ***
regionSpokane -0.115444 0.019094 -6.046 1.51e-09 ***
regionStLouis -0.130414 0.019094 -6.830 8.75e-12 ***
regionSyracuse -0.040710 0.019094 -2.132 0.033011 *
regionTampa -0.152189 0.019094 -7.971 1.67e-15 ***
regionTotalUS -0.242012 0.019094 -12.675 < 2e-16 ***
regionWest -0.288817 0.019094 -15.126 < 2e-16 ***
regionWestTexNewMexico -0.296624 0.019137 -15.500 < 2e-16 ***
quarter2 0.081121 0.005410 14.996 < 2e-16 ***
quarter3 0.218901 0.005409 40.471 < 2e-16 ***
quarter4 0.161972 0.005376 30.130 < 2e-16 ***
year2016 -0.036978 0.004684 -7.894 3.10e-15 ***
year2017 0.138658 0.004663 29.735 < 2e-16 ***
year2018 0.087412 0.008334 10.488 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2482 on 18188 degrees of freedom
Multiple R-squared: 0.6213, Adjusted R-squared: 0.62
F-statistic: 497.3 on 60 and 18188 DF, p-value: < 2.2e-16
Hmm, model4b with type, region, quarter and year wins here
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model4b) %>%
dplyr::select(-c("average_price", "type", "region", "quarter", "year"))
ggpairs(avocados_remaining_resid)
ggsave("pairs_plot_choice5.png", width = 10, height = 10, units = "in")
It looks like x_large_bags is the remaining contender, let’s check it out!
model5 <- lm(average_price ~ type + region + quarter + year + x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5)
summary(model5)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.03610 -0.14545 -0.00439 0.14420 1.43907
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.167e+00 1.429e-02 81.687 < 2e-16 ***
typeorganic 4.982e-01 3.755e-03 132.674 < 2e-16 ***
regionAtlanta -2.233e-01 1.909e-02 -11.698 < 2e-16 ***
regionBaltimoreWashington -2.698e-02 1.909e-02 -1.413 0.157614
regionBoise -2.129e-01 1.909e-02 -11.151 < 2e-16 ***
regionBoston -3.019e-02 1.909e-02 -1.582 0.113769
regionBuffaloRochester -4.424e-02 1.909e-02 -2.318 0.020485 *
regionCalifornia -1.713e-01 1.919e-02 -8.925 < 2e-16 ***
regionCharlotte 4.497e-02 1.909e-02 2.356 0.018493 *
regionChicago -4.616e-03 1.909e-02 -0.242 0.808941
regionCincinnatiDayton -3.521e-01 1.909e-02 -18.442 < 2e-16 ***
regionColumbus -3.084e-01 1.909e-02 -16.157 < 2e-16 ***
regionDallasFtWorth -4.759e-01 1.909e-02 -24.926 < 2e-16 ***
regionDenver -3.425e-01 1.909e-02 -17.940 < 2e-16 ***
regionDetroit -2.866e-01 1.910e-02 -15.008 < 2e-16 ***
regionGrandRapids -5.688e-02 1.909e-02 -2.979 0.002894 **
regionGreatLakes -2.292e-01 1.923e-02 -11.918 < 2e-16 ***
regionHarrisburgScranton -4.787e-02 1.909e-02 -2.508 0.012166 *
regionHartfordSpringfield 2.576e-01 1.909e-02 13.492 < 2e-16 ***
regionHouston -5.134e-01 1.909e-02 -26.894 < 2e-16 ***
regionIndianapolis -2.473e-01 1.909e-02 -12.954 < 2e-16 ***
regionJacksonville -5.015e-02 1.909e-02 -2.627 0.008615 **
regionLasVegas -1.801e-01 1.909e-02 -9.434 < 2e-16 ***
regionLosAngeles -3.493e-01 1.915e-02 -18.243 < 2e-16 ***
regionLouisville -2.744e-01 1.909e-02 -14.375 < 2e-16 ***
regionMiamiFtLauderdale -1.328e-01 1.909e-02 -6.958 3.58e-12 ***
regionMidsouth -1.577e-01 1.910e-02 -8.257 < 2e-16 ***
regionNashville -3.490e-01 1.909e-02 -18.282 < 2e-16 ***
regionNewOrleansMobile -2.567e-01 1.909e-02 -13.448 < 2e-16 ***
regionNewYork 1.662e-01 1.909e-02 8.706 < 2e-16 ***
regionNortheast 3.955e-02 1.910e-02 2.071 0.038381 *
regionNorthernNewEngland -8.371e-02 1.909e-02 -4.385 1.17e-05 ***
regionOrlando -5.503e-02 1.909e-02 -2.883 0.003945 **
regionPhiladelphia 7.103e-02 1.909e-02 3.721 0.000199 ***
regionPhoenixTucson -3.367e-01 1.909e-02 -17.638 < 2e-16 ***
regionPittsburgh -1.967e-01 1.909e-02 -10.305 < 2e-16 ***
regionPlains -1.257e-01 1.909e-02 -6.581 4.80e-11 ***
regionPortland -2.434e-01 1.909e-02 -12.748 < 2e-16 ***
regionRaleighGreensboro -5.972e-03 1.909e-02 -0.313 0.754415
regionRichmondNorfolk -2.698e-01 1.909e-02 -14.132 < 2e-16 ***
regionRoanoke -3.131e-01 1.909e-02 -16.404 < 2e-16 ***
regionSacramento 6.036e-02 1.909e-02 3.162 0.001571 **
regionSanDiego -1.630e-01 1.909e-02 -8.537 < 2e-16 ***
regionSanFrancisco 2.430e-01 1.909e-02 12.728 < 2e-16 ***
regionSeattle -1.185e-01 1.909e-02 -6.207 5.52e-10 ***
regionSouthCarolina -1.579e-01 1.909e-02 -8.274 < 2e-16 ***
regionSouthCentral -4.625e-01 1.911e-02 -24.199 < 2e-16 ***
regionSoutheast -1.656e-01 1.911e-02 -8.667 < 2e-16 ***
regionSpokane -1.154e-01 1.909e-02 -6.045 1.52e-09 ***
regionStLouis -1.306e-01 1.909e-02 -6.842 8.08e-12 ***
regionSyracuse -4.071e-02 1.909e-02 -2.132 0.032984 *
regionTampa -1.524e-01 1.909e-02 -7.983 1.52e-15 ***
regionTotalUS -2.647e-01 2.066e-02 -12.815 < 2e-16 ***
regionWest -2.897e-01 1.909e-02 -15.171 < 2e-16 ***
regionWestTexNewMexico -2.969e-01 1.913e-02 -15.518 < 2e-16 ***
quarter2 8.058e-02 5.412e-03 14.891 < 2e-16 ***
quarter3 2.181e-01 5.414e-03 40.293 < 2e-16 ***
quarter4 1.621e-01 5.375e-03 30.154 < 2e-16 ***
year2016 -3.791e-02 4.695e-03 -8.075 7.16e-16 ***
year2017 1.375e-01 4.680e-03 29.381 < 2e-16 ***
year2018 8.547e-02 8.360e-03 10.223 < 2e-16 ***
x_large_bags 3.583e-07 1.246e-07 2.877 0.004025 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2482 on 18187 degrees of freedom
Multiple R-squared: 0.6214, Adjusted R-squared: 0.6202
F-statistic: 489.4 on 61 and 18187 DF, p-value: < 2.2e-16
It is a significant explanatory variable, so let’s keep it. Overall, we still have some heterscedasticity and deviations from normality in the residuals.
Let’s now think about possible pair interactions: for five main effect variables we have ten possible pair interactions. Let’s test them out.
model5pa <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:region, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pa)
summary(model5pa)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags + type:region, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.00812 -0.13347 -0.00249 0.13359 1.48016
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.203e+00 1.855e-02 64.874 < 2e-16 ***
typeorganic 4.246e-01 2.558e-02 16.598 < 2e-16 ***
regionAtlanta -2.801e-01 2.558e-02 -10.950 < 2e-16 ***
regionBaltimoreWashington -4.684e-03 2.558e-02 -0.183 0.854724
regionBoise -2.727e-01 2.558e-02 -10.660 < 2e-16 ***
regionBoston -4.441e-02 2.558e-02 -1.736 0.082557 .
regionBuffaloRochester 3.352e-02 2.558e-02 1.310 0.190080
regionCalifornia -2.474e-01 2.600e-02 -9.516 < 2e-16 ***
regionCharlotte -7.369e-02 2.558e-02 -2.881 0.003973 **
regionChicago 2.033e-02 2.558e-02 0.795 0.426797
regionCincinnatiDayton -3.334e-01 2.558e-02 -13.034 < 2e-16 ***
regionColumbus -2.826e-01 2.558e-02 -11.048 < 2e-16 ***
regionDallasFtWorth -5.026e-01 2.558e-02 -19.647 < 2e-16 ***
regionDenver -2.748e-01 2.558e-02 -10.743 < 2e-16 ***
regionDetroit -2.260e-01 2.562e-02 -8.823 < 2e-16 ***
regionGrandRapids -2.435e-02 2.559e-02 -0.951 0.341382
regionGreatLakes -1.718e-01 2.619e-02 -6.560 5.54e-11 ***
regionHarrisburgScranton -9.003e-02 2.558e-02 -3.519 0.000434 ***
regionHartfordSpringfield 5.926e-02 2.558e-02 2.317 0.020528 *
regionHouston -5.239e-01 2.558e-02 -20.479 < 2e-16 ***
regionIndianapolis -2.041e-01 2.558e-02 -7.978 1.57e-15 ***
regionJacksonville -1.552e-01 2.558e-02 -6.067 1.33e-09 ***
regionLasVegas -3.358e-01 2.558e-02 -13.126 < 2e-16 ***
regionLosAngeles -3.755e-01 2.583e-02 -14.536 < 2e-16 ***
regionLouisville -2.435e-01 2.558e-02 -9.518 < 2e-16 ***
regionMiamiFtLauderdale -9.464e-02 2.558e-02 -3.700 0.000217 ***
regionMidsouth -1.426e-01 2.561e-02 -5.570 2.58e-08 ***
regionNashville -3.359e-01 2.558e-02 -13.132 < 2e-16 ***
regionNewOrleansMobile -2.639e-01 2.558e-02 -10.313 < 2e-16 ***
regionNewYork 5.313e-02 2.558e-02 2.077 0.037842 *
regionNortheast -5.307e-03 2.560e-02 -0.207 0.835817
regionNorthernNewEngland -8.857e-02 2.558e-02 -3.463 0.000536 ***
regionOrlando -1.345e-01 2.558e-02 -5.257 1.48e-07 ***
regionPhiladelphia 4.753e-02 2.558e-02 1.858 0.063204 .
regionPhoenixTucson -6.206e-01 2.558e-02 -24.261 < 2e-16 ***
regionPittsburgh -9.812e-02 2.558e-02 -3.836 0.000126 ***
regionPlains -1.841e-01 2.560e-02 -7.192 6.66e-13 ***
regionPortland -3.023e-01 2.558e-02 -11.817 < 2e-16 ***
regionRaleighGreensboro -1.217e-01 2.558e-02 -4.757 1.98e-06 ***
regionRichmondNorfolk -2.290e-01 2.558e-02 -8.952 < 2e-16 ***
regionRoanoke -2.528e-01 2.558e-02 -9.881 < 2e-16 ***
regionSacramento -7.492e-02 2.558e-02 -2.929 0.003407 **
regionSanDiego -2.874e-01 2.558e-02 -11.233 < 2e-16 ***
regionSanFrancisco 4.827e-02 2.558e-02 1.887 0.059175 .
regionSeattle -1.790e-01 2.558e-02 -6.998 2.69e-12 ***
regionSouthCarolina -2.027e-01 2.558e-02 -7.923 2.44e-15 ***
regionSouthCentral -4.814e-01 2.568e-02 -18.742 < 2e-16 ***
regionSoutheast -1.877e-01 2.567e-02 -7.310 2.79e-13 ***
regionSpokane -2.328e-01 2.558e-02 -9.099 < 2e-16 ***
regionStLouis -1.632e-01 2.558e-02 -6.378 1.84e-10 ***
regionSyracuse 3.817e-02 2.558e-02 1.492 0.135705
regionTampa -1.473e-01 2.558e-02 -5.759 8.62e-09 ***
regionTotalUS -2.734e-01 3.186e-02 -8.583 < 2e-16 ***
regionWest -3.643e-01 2.559e-02 -14.235 < 2e-16 ***
regionWestTexNewMexico -5.068e-01 2.558e-02 -19.813 < 2e-16 ***
quarter2 8.101e-02 5.129e-03 15.793 < 2e-16 ***
quarter3 2.186e-01 5.134e-03 42.587 < 2e-16 ***
quarter4 1.620e-01 5.093e-03 31.820 < 2e-16 ***
year2016 -3.735e-02 4.455e-03 -8.385 < 2e-16 ***
year2017 1.383e-01 4.444e-03 31.110 < 2e-16 ***
year2018 8.670e-02 7.937e-03 10.923 < 2e-16 ***
x_large_bags 1.318e-07 1.499e-07 0.879 0.379416
typeorganic:regionAtlanta 1.139e-01 3.618e-02 3.149 0.001642 **
typeorganic:regionBaltimoreWashington -4.437e-02 3.618e-02 -1.226 0.220035
typeorganic:regionBoise 1.196e-01 3.618e-02 3.307 0.000946 ***
typeorganic:regionBoston 2.849e-02 3.618e-02 0.788 0.430916
typeorganic:regionBuffaloRochester -1.555e-01 3.618e-02 -4.298 1.74e-05 ***
typeorganic:regionCalifornia 1.593e-01 3.647e-02 4.367 1.27e-05 ***
typeorganic:regionCharlotte 2.374e-01 3.618e-02 6.561 5.48e-11 ***
typeorganic:regionChicago -4.944e-02 3.618e-02 -1.367 0.171744
typeorganic:regionCincinnatiDayton -3.699e-02 3.618e-02 -1.022 0.306593
typeorganic:regionColumbus -5.140e-02 3.618e-02 -1.421 0.155386
typeorganic:regionDallasFtWorth 5.403e-02 3.618e-02 1.493 0.135327
typeorganic:regionDenver -1.353e-01 3.618e-02 -3.741 0.000184 ***
typeorganic:regionDetroit -1.190e-01 3.620e-02 -3.288 0.001010 **
typeorganic:regionGrandRapids -6.400e-02 3.618e-02 -1.769 0.076968 .
typeorganic:regionGreatLakes -1.063e-01 3.661e-02 -2.903 0.003698 **
typeorganic:regionHarrisburgScranton 8.447e-02 3.618e-02 2.335 0.019563 *
typeorganic:regionHartfordSpringfield 3.967e-01 3.618e-02 10.965 < 2e-16 ***
typeorganic:regionHouston 2.134e-02 3.618e-02 0.590 0.555192
typeorganic:regionIndianapolis -8.609e-02 3.618e-02 -2.380 0.017343 *
typeorganic:regionJacksonville 2.102e-01 3.618e-02 5.810 6.37e-09 ***
typeorganic:regionLasVegas 3.113e-01 3.618e-02 8.606 < 2e-16 ***
typeorganic:regionLosAngeles 5.770e-02 3.635e-02 1.587 0.112476
typeorganic:regionLouisville -6.178e-02 3.618e-02 -1.708 0.087678 .
typeorganic:regionMiamiFtLauderdale -7.601e-02 3.618e-02 -2.101 0.035652 *
typeorganic:regionMidsouth -2.831e-02 3.620e-02 -0.782 0.434169
typeorganic:regionNashville -2.610e-02 3.618e-02 -0.721 0.470616
typeorganic:regionNewOrleansMobile 1.486e-02 3.618e-02 0.411 0.681207
typeorganic:regionNewYork 2.266e-01 3.618e-02 6.263 3.86e-10 ***
typeorganic:regionNortheast 9.140e-02 3.619e-02 2.525 0.011567 *
typeorganic:regionNorthernNewEngland 9.816e-03 3.618e-02 0.271 0.786139
typeorganic:regionOrlando 1.591e-01 3.618e-02 4.399 1.09e-05 ***
typeorganic:regionPhiladelphia 4.709e-02 3.618e-02 1.302 0.193037
typeorganic:regionPhoenixTucson 5.680e-01 3.618e-02 15.700 < 2e-16 ***
typeorganic:regionPittsburgh -1.972e-01 3.618e-02 -5.451 5.06e-08 ***
typeorganic:regionPlains 1.183e-01 3.619e-02 3.269 0.001082 **
typeorganic:regionPortland 1.179e-01 3.618e-02 3.259 0.001120 **
typeorganic:regionRaleighGreensboro 2.315e-01 3.618e-02 6.400 1.59e-10 ***
typeorganic:regionRichmondNorfolk -8.148e-02 3.618e-02 -2.252 0.024322 *
typeorganic:regionRoanoke -1.207e-01 3.618e-02 -3.338 0.000847 ***
typeorganic:regionSacramento 2.708e-01 3.618e-02 7.485 7.48e-14 ***
typeorganic:regionSanDiego 2.489e-01 3.618e-02 6.880 6.18e-12 ***
typeorganic:regionSanFrancisco 3.897e-01 3.618e-02 10.771 < 2e-16 ***
typeorganic:regionSeattle 1.211e-01 3.618e-02 3.347 0.000819 ***
typeorganic:regionSouthCarolina 8.973e-02 3.618e-02 2.480 0.013136 *
typeorganic:regionSouthCentral 4.114e-02 3.625e-02 1.135 0.256458
typeorganic:regionSoutheast 4.737e-02 3.624e-02 1.307 0.191198
typeorganic:regionSpokane 2.346e-01 3.618e-02 6.486 9.03e-11 ***
typeorganic:regionStLouis 6.535e-02 3.618e-02 1.806 0.070875 .
typeorganic:regionSyracuse -1.578e-01 3.618e-02 -4.361 1.30e-05 ***
typeorganic:regionTampa -9.910e-03 3.618e-02 -0.274 0.784145
typeorganic:regionTotalUS 4.616e-02 4.086e-02 1.130 0.258597
typeorganic:regionWest 1.503e-01 3.618e-02 4.154 3.28e-05 ***
typeorganic:regionWestTexNewMexico 4.234e-01 3.626e-02 11.676 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2351 on 18134 degrees of freedom
Multiple R-squared: 0.6611, Adjusted R-squared: 0.659
F-statistic: 310.3 on 114 and 18134 DF, p-value: < 2.2e-16
model5pb <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pb)
summary(model5pb)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags + type:quarter, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.02270 -0.14602 -0.00362 0.14398 1.44165
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.180e+00 1.454e-02 81.176 < 2e-16 ***
typeorganic 4.717e-01 6.719e-03 70.203 < 2e-16 ***
regionAtlanta -2.233e-01 1.907e-02 -11.713 < 2e-16 ***
regionBaltimoreWashington -2.699e-02 1.907e-02 -1.416 0.156893
regionBoise -2.129e-01 1.907e-02 -11.163 < 2e-16 ***
regionBoston -3.020e-02 1.907e-02 -1.584 0.113308
regionBuffaloRochester -4.425e-02 1.907e-02 -2.320 0.020331 *
regionCalifornia -1.718e-01 1.917e-02 -8.962 < 2e-16 ***
regionCharlotte 4.497e-02 1.907e-02 2.358 0.018367 *
regionChicago -4.649e-03 1.907e-02 -0.244 0.807387
regionCincinnatiDayton -3.521e-01 1.907e-02 -18.465 < 2e-16 ***
regionColumbus -3.085e-01 1.907e-02 -16.177 < 2e-16 ***
regionDallasFtWorth -4.759e-01 1.907e-02 -24.957 < 2e-16 ***
regionDenver -3.425e-01 1.907e-02 -17.960 < 2e-16 ***
regionDetroit -2.868e-01 1.908e-02 -15.034 < 2e-16 ***
regionGrandRapids -5.696e-02 1.907e-02 -2.987 0.002824 **
regionGreatLakes -2.298e-01 1.921e-02 -11.964 < 2e-16 ***
regionHarrisburgScranton -4.788e-02 1.907e-02 -2.511 0.012048 *
regionHartfordSpringfield 2.576e-01 1.907e-02 13.508 < 2e-16 ***
regionHouston -5.134e-01 1.907e-02 -26.926 < 2e-16 ***
regionIndianapolis -2.473e-01 1.907e-02 -12.970 < 2e-16 ***
regionJacksonville -5.016e-02 1.907e-02 -2.631 0.008531 **
regionLasVegas -1.801e-01 1.907e-02 -9.444 < 2e-16 ***
regionLosAngeles -3.497e-01 1.913e-02 -18.284 < 2e-16 ***
regionLouisville -2.744e-01 1.907e-02 -14.392 < 2e-16 ***
regionMiamiFtLauderdale -1.328e-01 1.907e-02 -6.967 3.35e-12 ***
regionMidsouth -1.578e-01 1.907e-02 -8.274 < 2e-16 ***
regionNashville -3.490e-01 1.907e-02 -18.303 < 2e-16 ***
regionNewOrleansMobile -2.568e-01 1.907e-02 -13.466 < 2e-16 ***
regionNewYork 1.662e-01 1.907e-02 8.714 < 2e-16 ***
regionNortheast 3.942e-02 1.907e-02 2.067 0.038772 *
regionNorthernNewEngland -8.372e-02 1.907e-02 -4.390 1.14e-05 ***
regionOrlando -5.505e-02 1.907e-02 -2.887 0.003892 **
regionPhiladelphia 7.102e-02 1.907e-02 3.725 0.000196 ***
regionPhoenixTucson -3.367e-01 1.907e-02 -17.659 < 2e-16 ***
regionPittsburgh -1.967e-01 1.907e-02 -10.317 < 2e-16 ***
regionPlains -1.258e-01 1.907e-02 -6.594 4.39e-11 ***
regionPortland -2.434e-01 1.907e-02 -12.762 < 2e-16 ***
regionRaleighGreensboro -5.977e-03 1.907e-02 -0.313 0.753941
regionRichmondNorfolk -2.698e-01 1.907e-02 -14.149 < 2e-16 ***
regionRoanoke -3.131e-01 1.907e-02 -16.423 < 2e-16 ***
regionSacramento 6.034e-02 1.907e-02 3.164 0.001556 **
regionSanDiego -1.630e-01 1.907e-02 -8.548 < 2e-16 ***
regionSanFrancisco 2.430e-01 1.907e-02 12.742 < 2e-16 ***
regionSeattle -1.185e-01 1.907e-02 -6.214 5.28e-10 ***
regionSouthCarolina -1.580e-01 1.907e-02 -8.284 < 2e-16 ***
regionSouthCentral -4.628e-01 1.909e-02 -24.240 < 2e-16 ***
regionSoutheast -1.659e-01 1.909e-02 -8.690 < 2e-16 ***
regionSpokane -1.154e-01 1.907e-02 -6.052 1.46e-09 ***
regionStLouis -1.306e-01 1.907e-02 -6.850 7.60e-12 ***
regionSyracuse -4.071e-02 1.907e-02 -2.135 0.032785 *
regionTampa -1.524e-01 1.907e-02 -7.993 1.40e-15 ***
regionTotalUS -2.668e-01 2.064e-02 -12.928 < 2e-16 ***
regionWest -2.897e-01 1.907e-02 -15.193 < 2e-16 ***
regionWestTexNewMexico -2.969e-01 1.911e-02 -15.537 < 2e-16 ***
quarter2 6.536e-02 7.416e-03 8.814 < 2e-16 ***
quarter3 1.848e-01 7.423e-03 24.898 < 2e-16 ***
quarter4 1.530e-01 7.364e-03 20.776 < 2e-16 ***
year2016 -3.800e-02 4.689e-03 -8.102 5.72e-16 ***
year2017 1.374e-01 4.674e-03 29.392 < 2e-16 ***
year2018 8.529e-02 8.351e-03 10.213 < 2e-16 ***
x_large_bags 3.916e-07 1.246e-07 3.142 0.001682 **
typeorganic:quarter2 3.034e-02 1.015e-02 2.989 0.002800 **
typeorganic:quarter3 6.653e-02 1.015e-02 6.553 5.80e-11 ***
typeorganic:quarter4 1.817e-02 1.008e-02 1.803 0.071446 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2479 on 18184 degrees of freedom
Multiple R-squared: 0.6224, Adjusted R-squared: 0.621
F-statistic: 468.3 on 64 and 18184 DF, p-value: < 2.2e-16
model5pc <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pc)
summary(model5pc)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags + type:year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.00898 -0.14443 -0.00472 0.13873 1.46680
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.118e+00 1.442e-02 77.501 < 2e-16 ***
typeorganic 5.956e-01 6.569e-03 90.667 < 2e-16 ***
regionAtlanta -2.232e-01 1.892e-02 -11.796 < 2e-16 ***
regionBaltimoreWashington -2.687e-02 1.892e-02 -1.420 0.155567
regionBoise -2.129e-01 1.892e-02 -11.252 < 2e-16 ***
regionBoston -3.016e-02 1.892e-02 -1.594 0.110873
regionBuffaloRochester -4.422e-02 1.892e-02 -2.337 0.019445 *
regionCalifornia -1.678e-01 1.902e-02 -8.823 < 2e-16 ***
regionCharlotte 4.499e-02 1.892e-02 2.378 0.017419 *
regionChicago -4.393e-03 1.892e-02 -0.232 0.816388
regionCincinnatiDayton -3.519e-01 1.892e-02 -18.601 < 2e-16 ***
regionColumbus -3.083e-01 1.892e-02 -16.297 < 2e-16 ***
regionDallasFtWorth -4.756e-01 1.892e-02 -25.137 < 2e-16 ***
regionDenver -3.425e-01 1.892e-02 -18.101 < 2e-16 ***
regionDetroit -2.856e-01 1.893e-02 -15.087 < 2e-16 ***
regionGrandRapids -5.635e-02 1.892e-02 -2.978 0.002904 **
regionGreatLakes -2.250e-01 1.906e-02 -11.803 < 2e-16 ***
regionHarrisburgScranton -4.780e-02 1.892e-02 -2.526 0.011537 *
regionHartfordSpringfield 2.576e-01 1.892e-02 13.615 < 2e-16 ***
regionHouston -5.132e-01 1.892e-02 -27.126 < 2e-16 ***
regionIndianapolis -2.471e-01 1.892e-02 -13.062 < 2e-16 ***
regionJacksonville -5.011e-02 1.892e-02 -2.649 0.008085 **
regionLasVegas -1.801e-01 1.892e-02 -9.520 < 2e-16 ***
regionLosAngeles -3.466e-01 1.898e-02 -18.265 < 2e-16 ***
regionLouisville -2.744e-01 1.892e-02 -14.502 < 2e-16 ***
regionMiamiFtLauderdale -1.326e-01 1.892e-02 -7.011 2.45e-12 ***
regionMidsouth -1.568e-01 1.893e-02 -8.285 < 2e-16 ***
regionNashville -3.490e-01 1.892e-02 -18.445 < 2e-16 ***
regionNewOrleansMobile -2.564e-01 1.892e-02 -13.553 < 2e-16 ***
regionNewYork 1.664e-01 1.892e-02 8.796 < 2e-16 ***
regionNortheast 4.039e-02 1.893e-02 2.134 0.032855 *
regionNorthernNewEngland -8.367e-02 1.892e-02 -4.422 9.83e-06 ***
regionOrlando -5.490e-02 1.892e-02 -2.902 0.003714 **
regionPhiladelphia 7.107e-02 1.892e-02 3.756 0.000173 ***
regionPhoenixTucson -3.366e-01 1.892e-02 -17.793 < 2e-16 ***
regionPittsburgh -1.967e-01 1.892e-02 -10.398 < 2e-16 ***
regionPlains -1.249e-01 1.892e-02 -6.603 4.14e-11 ***
regionPortland -2.433e-01 1.892e-02 -12.861 < 2e-16 ***
regionRaleighGreensboro -5.938e-03 1.892e-02 -0.314 0.753649
regionRichmondNorfolk -2.697e-01 1.892e-02 -14.257 < 2e-16 ***
regionRoanoke -3.131e-01 1.892e-02 -16.550 < 2e-16 ***
regionSacramento 6.047e-02 1.892e-02 3.196 0.001396 **
regionSanDiego -1.629e-01 1.892e-02 -8.611 < 2e-16 ***
regionSanFrancisco 2.431e-01 1.892e-02 12.849 < 2e-16 ***
regionSeattle -1.185e-01 1.892e-02 -6.262 3.89e-10 ***
regionSouthCarolina -1.578e-01 1.892e-02 -8.342 < 2e-16 ***
regionSouthCentral -4.608e-01 1.894e-02 -24.326 < 2e-16 ***
regionSoutheast -1.640e-01 1.894e-02 -8.658 < 2e-16 ***
regionSpokane -1.154e-01 1.892e-02 -6.101 1.07e-09 ***
regionStLouis -1.305e-01 1.892e-02 -6.897 5.49e-12 ***
regionSyracuse -4.071e-02 1.892e-02 -2.152 0.031432 *
regionTampa -1.523e-01 1.892e-02 -8.048 8.93e-16 ***
regionTotalUS -2.505e-01 2.049e-02 -12.226 < 2e-16 ***
regionWest -2.891e-01 1.892e-02 -15.280 < 2e-16 ***
regionWestTexNewMexico -2.967e-01 1.896e-02 -15.650 < 2e-16 ***
quarter2 8.091e-02 5.363e-03 15.085 < 2e-16 ***
quarter3 2.186e-01 5.366e-03 40.744 < 2e-16 ***
quarter4 1.620e-01 5.327e-03 30.417 < 2e-16 ***
year2016 2.694e-02 6.596e-03 4.084 4.45e-05 ***
year2017 2.152e-01 6.582e-03 32.691 < 2e-16 ***
year2018 1.641e-01 1.128e-02 14.549 < 2e-16 ***
x_large_bags 1.338e-07 1.241e-07 1.078 0.281087
typeorganic:year2016 -1.285e-01 9.306e-03 -13.813 < 2e-16 ***
typeorganic:year2017 -1.540e-01 9.275e-03 -16.600 < 2e-16 ***
typeorganic:year2018 -1.548e-01 1.520e-02 -10.184 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.246 on 18184 degrees of freedom
Multiple R-squared: 0.6282, Adjusted R-squared: 0.6269
F-statistic: 480.1 on 64 and 18184 DF, p-value: < 2.2e-16
model5pd <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pd)
summary(model5pd)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags + type:x_large_bags, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.03574 -0.14591 -0.00478 0.14434 1.43935
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.168e+00 1.429e-02 81.734 < 2e-16 ***
typeorganic 4.978e-01 3.757e-03 132.483 < 2e-16 ***
regionAtlanta -2.233e-01 1.909e-02 -11.701 < 2e-16 ***
regionBaltimoreWashington -2.699e-02 1.909e-02 -1.414 0.157339
regionBoise -2.130e-01 1.909e-02 -11.159 < 2e-16 ***
regionBoston -3.020e-02 1.909e-02 -1.582 0.113671
regionBuffaloRochester -4.425e-02 1.909e-02 -2.318 0.020456 *
regionCalifornia -1.717e-01 1.918e-02 -8.949 < 2e-16 ***
regionCharlotte 4.497e-02 1.909e-02 2.356 0.018481 *
regionChicago -4.644e-03 1.909e-02 -0.243 0.807777
regionCincinnatiDayton -3.521e-01 1.909e-02 -18.446 < 2e-16 ***
regionColumbus -3.085e-01 1.909e-02 -16.160 < 2e-16 ***
regionDallasFtWorth -4.759e-01 1.909e-02 -24.932 < 2e-16 ***
regionDenver -3.425e-01 1.909e-02 -17.943 < 2e-16 ***
regionDetroit -2.868e-01 1.910e-02 -15.017 < 2e-16 ***
regionGrandRapids -5.695e-02 1.909e-02 -2.983 0.002857 **
regionGreatLakes -2.297e-01 1.923e-02 -11.947 < 2e-16 ***
regionHarrisburgScranton -4.788e-02 1.909e-02 -2.508 0.012135 *
regionHartfordSpringfield 2.576e-01 1.909e-02 13.494 < 2e-16 ***
regionHouston -5.134e-01 1.909e-02 -26.899 < 2e-16 ***
regionIndianapolis -2.473e-01 1.909e-02 -12.957 < 2e-16 ***
regionJacksonville -5.016e-02 1.909e-02 -2.628 0.008598 **
regionLasVegas -1.801e-01 1.909e-02 -9.435 < 2e-16 ***
regionLosAngeles -3.496e-01 1.915e-02 -18.263 < 2e-16 ***
regionLouisville -2.744e-01 1.909e-02 -14.377 < 2e-16 ***
regionMiamiFtLauderdale -1.328e-01 1.909e-02 -6.960 3.52e-12 ***
regionMidsouth -1.578e-01 1.909e-02 -8.265 < 2e-16 ***
regionNashville -3.490e-01 1.909e-02 -18.285 < 2e-16 ***
regionNewOrleansMobile -2.568e-01 1.909e-02 -13.453 < 2e-16 ***
regionNewYork 1.662e-01 1.909e-02 8.706 < 2e-16 ***
regionNortheast 3.944e-02 1.909e-02 2.066 0.038871 *
regionNorthernNewEngland -8.372e-02 1.909e-02 -4.386 1.16e-05 ***
regionOrlando -5.505e-02 1.909e-02 -2.884 0.003929 **
regionPhiladelphia 7.102e-02 1.909e-02 3.721 0.000199 ***
regionPhoenixTucson -3.367e-01 1.909e-02 -17.642 < 2e-16 ***
regionPittsburgh -1.967e-01 1.909e-02 -10.307 < 2e-16 ***
regionPlains -1.258e-01 1.909e-02 -6.589 4.56e-11 ***
regionPortland -2.447e-01 1.909e-02 -12.817 < 2e-16 ***
regionRaleighGreensboro -5.976e-03 1.909e-02 -0.313 0.754207
regionRichmondNorfolk -2.698e-01 1.909e-02 -14.135 < 2e-16 ***
regionRoanoke -3.131e-01 1.909e-02 -16.406 < 2e-16 ***
regionSacramento 6.034e-02 1.909e-02 3.161 0.001572 **
regionSanDiego -1.630e-01 1.909e-02 -8.539 < 2e-16 ***
regionSanFrancisco 2.430e-01 1.909e-02 12.730 < 2e-16 ***
regionSeattle -1.212e-01 1.912e-02 -6.341 2.34e-10 ***
regionSouthCarolina -1.580e-01 1.909e-02 -8.276 < 2e-16 ***
regionSouthCentral -4.628e-01 1.911e-02 -24.214 < 2e-16 ***
regionSoutheast -1.658e-01 1.911e-02 -8.679 < 2e-16 ***
regionSpokane -1.156e-01 1.909e-02 -6.056 1.42e-09 ***
regionStLouis -1.306e-01 1.909e-02 -6.843 7.98e-12 ***
regionSyracuse -4.071e-02 1.909e-02 -2.133 0.032957 *
regionTampa -1.524e-01 1.909e-02 -7.985 1.49e-15 ***
regionTotalUS -2.719e-01 2.084e-02 -13.048 < 2e-16 ***
regionWest -2.951e-01 1.920e-02 -15.366 < 2e-16 ***
regionWestTexNewMexico -2.970e-01 1.913e-02 -15.524 < 2e-16 ***
quarter2 8.054e-02 5.411e-03 14.885 < 2e-16 ***
quarter3 2.180e-01 5.414e-03 40.259 < 2e-16 ***
quarter4 1.616e-01 5.377e-03 30.058 < 2e-16 ***
year2016 -3.798e-02 4.694e-03 -8.092 6.25e-16 ***
year2017 1.370e-01 4.684e-03 29.241 < 2e-16 ***
year2018 8.319e-02 8.405e-03 9.898 < 2e-16 ***
x_large_bags 3.865e-07 1.250e-07 3.091 0.001995 **
typeorganic:x_large_bags 4.737e-04 1.827e-04 2.593 0.009522 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2481 on 18186 degrees of freedom
Multiple R-squared: 0.6216, Adjusted R-squared: 0.6203
F-statistic: 481.8 on 62 and 18186 DF, p-value: < 2.2e-16
model5pe <- lm(average_price ~ type + region + quarter + year + x_large_bags + region:quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pe)
summary(model5pe)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags + region:quarter, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.06468 -0.14582 0.00048 0.14087 1.38018
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.216e+00 2.423e-02 50.190 < 2e-16 ***
typeorganic 4.985e-01 3.663e-03 136.095 < 2e-16 ***
regionAtlanta -2.579e-01 3.388e-02 -7.611 2.85e-14 ***
regionBaltimoreWashington -8.986e-02 3.388e-02 -2.652 0.008000 **
regionBoise -2.854e-01 3.388e-02 -8.424 < 2e-16 ***
regionBoston -7.093e-03 3.388e-02 -0.209 0.834158
regionBuffaloRochester -3.109e-02 3.388e-02 -0.918 0.358774
regionCalifornia -2.868e-01 3.394e-02 -8.450 < 2e-16 ***
regionCharlotte -2.147e-02 3.388e-02 -0.634 0.526347
regionChicago -7.400e-02 3.388e-02 -2.184 0.028945 *
regionCincinnatiDayton -4.353e-01 3.388e-02 -12.849 < 2e-16 ***
regionColumbus -3.251e-01 3.388e-02 -9.595 < 2e-16 ***
regionDallasFtWorth -4.853e-01 3.388e-02 -14.325 < 2e-16 ***
regionDenver -4.216e-01 3.388e-02 -12.443 < 2e-16 ***
regionDetroit -3.074e-01 3.389e-02 -9.071 < 2e-16 ***
regionGrandRapids -1.295e-01 3.388e-02 -3.824 0.000132 ***
regionGreatLakes -2.769e-01 3.398e-02 -8.150 3.88e-16 ***
regionHarrisburgScranton -6.005e-02 3.388e-02 -1.773 0.076319 .
regionHartfordSpringfield 2.290e-01 3.388e-02 6.759 1.43e-11 ***
regionHouston -5.375e-01 3.388e-02 -15.867 < 2e-16 ***
regionIndianapolis -2.742e-01 3.388e-02 -8.093 6.20e-16 ***
regionJacksonville -1.104e-01 3.388e-02 -3.259 0.001121 **
regionLasVegas -2.907e-01 3.388e-02 -8.581 < 2e-16 ***
regionLosAngeles -4.383e-01 3.391e-02 -12.923 < 2e-16 ***
regionLouisville -2.956e-01 3.388e-02 -8.725 < 2e-16 ***
regionMiamiFtLauderdale -1.119e-01 3.388e-02 -3.302 0.000962 ***
regionMidsouth -1.953e-01 3.388e-02 -5.764 8.33e-09 ***
regionNashville -3.514e-01 3.388e-02 -10.372 < 2e-16 ***
regionNewOrleansMobile -3.177e-01 3.388e-02 -9.377 < 2e-16 ***
regionNewYork 1.048e-01 3.388e-02 3.094 0.001979 **
regionNortheast 1.933e-02 3.388e-02 0.570 0.568361
regionNorthernNewEngland -5.982e-02 3.388e-02 -1.766 0.077455 .
regionOrlando -1.034e-01 3.388e-02 -3.053 0.002269 **
regionPhiladelphia 1.650e-02 3.388e-02 0.487 0.626253
regionPhoenixTucson -4.456e-01 3.388e-02 -13.153 < 2e-16 ***
regionPittsburgh -1.745e-01 3.388e-02 -5.151 2.62e-07 ***
regionPlains -1.852e-01 3.388e-02 -5.466 4.67e-08 ***
regionPortland -3.533e-01 3.388e-02 -10.429 < 2e-16 ***
regionRaleighGreensboro -5.803e-02 3.388e-02 -1.713 0.086751 .
regionRichmondNorfolk -2.636e-01 3.388e-02 -7.782 7.52e-15 ***
regionRoanoke -3.123e-01 3.388e-02 -9.217 < 2e-16 ***
regionSacramento -2.741e-02 3.388e-02 -0.809 0.418472
regionSanDiego -2.868e-01 3.388e-02 -8.466 < 2e-16 ***
regionSanFrancisco 9.026e-02 3.388e-02 2.664 0.007726 **
regionSeattle -2.589e-01 3.388e-02 -7.642 2.25e-14 ***
regionSouthCarolina -2.071e-01 3.388e-02 -6.114 9.93e-10 ***
regionSouthCentral -4.798e-01 3.390e-02 -14.153 < 2e-16 ***
regionSoutheast -2.084e-01 3.388e-02 -6.151 7.88e-10 ***
regionSpokane -2.696e-01 3.388e-02 -7.958 1.85e-15 ***
regionStLouis -1.910e-01 3.388e-02 -5.639 1.74e-08 ***
regionSyracuse -2.764e-02 3.388e-02 -0.816 0.414661
regionTampa -1.532e-01 3.388e-02 -4.523 6.14e-06 ***
regionTotalUS -3.151e-01 3.466e-02 -9.091 < 2e-16 ***
regionWest -3.903e-01 3.388e-02 -11.520 < 2e-16 ***
regionWestTexNewMexico -3.665e-01 3.388e-02 -10.818 < 2e-16 ***
quarter2 8.528e-02 3.644e-02 2.341 0.019266 *
quarter3 9.278e-02 3.644e-02 2.546 0.010895 *
quarter4 7.165e-02 3.618e-02 1.981 0.047660 *
year2016 -3.808e-02 4.577e-03 -8.319 < 2e-16 ***
year2017 1.373e-01 4.563e-03 30.081 < 2e-16 ***
year2018 8.513e-02 8.151e-03 10.444 < 2e-16 ***
x_large_bags 4.158e-07 1.233e-07 3.373 0.000746 ***
regionAtlanta:quarter2 -8.875e-02 5.147e-02 -1.725 0.084627 .
regionBaltimoreWashington:quarter2 9.216e-02 5.147e-02 1.791 0.073359 .
regionBoise:quarter2 -9.544e-02 5.147e-02 -1.854 0.063692 .
regionBoston:quarter2 1.139e-02 5.147e-02 0.221 0.824911
regionBuffaloRochester:quarter2 8.166e-02 5.147e-02 1.587 0.112579
regionCalifornia:quarter2 4.240e-03 5.147e-02 0.082 0.934345
regionCharlotte:quarter2 6.218e-02 5.147e-02 1.208 0.226952
regionChicago:quarter2 -4.249e-03 5.147e-02 -0.083 0.934198
regionCincinnatiDayton:quarter2 1.014e-02 5.147e-02 0.197 0.843877
regionColumbus:quarter2 -9.402e-02 5.147e-02 -1.827 0.067727 .
regionDallasFtWorth:quarter2 -7.789e-02 5.147e-02 -1.513 0.130177
regionDenver:quarter2 -1.578e-02 5.147e-02 -0.307 0.759141
regionDetroit:quarter2 -3.691e-02 5.147e-02 -0.717 0.473257
regionGrandRapids:quarter2 1.363e-01 5.147e-02 2.649 0.008086 **
regionGreatLakes:quarter2 -1.091e-02 5.147e-02 -0.212 0.832191
regionHarrisburgScranton:quarter2 6.543e-02 5.147e-02 1.271 0.203625
regionHartfordSpringfield:quarter2 6.725e-02 5.147e-02 1.307 0.191332
regionHouston:quarter2 -8.920e-02 5.147e-02 -1.733 0.083088 .
regionIndianapolis:quarter2 -6.425e-02 5.147e-02 -1.248 0.211928
regionJacksonville:quarter2 2.811e-02 5.147e-02 0.546 0.584928
regionLasVegas:quarter2 -7.424e-02 5.147e-02 -1.443 0.149173
regionLosAngeles:quarter2 -6.060e-02 5.147e-02 -1.177 0.239049
regionLouisville:quarter2 -7.449e-02 5.147e-02 -1.447 0.147834
regionMiamiFtLauderdale:quarter2 -1.020e-02 5.147e-02 -0.198 0.842828
regionMidsouth:quarter2 -1.515e-02 5.147e-02 -0.294 0.768501
regionNashville:quarter2 -1.026e-01 5.147e-02 -1.993 0.046304 *
regionNewOrleansMobile:quarter2 8.341e-02 5.147e-02 1.621 0.105105
regionNewYork:quarter2 8.732e-02 5.147e-02 1.697 0.089772 .
regionNortheast:quarter2 5.500e-02 5.147e-02 1.069 0.285265
regionNorthernNewEngland:quarter2 -6.770e-02 5.147e-02 -1.316 0.188354
regionOrlando:quarter2 1.769e-02 5.147e-02 0.344 0.731089
regionPhiladelphia:quarter2 1.100e-01 5.147e-02 2.137 0.032587 *
regionPhoenixTucson:quarter2 -1.980e-02 5.147e-02 -0.385 0.700459
regionPittsburgh:quarter2 -3.807e-02 5.147e-02 -0.740 0.459513
regionPlains:quarter2 -4.009e-03 5.147e-02 -0.078 0.937911
regionPortland:quarter2 -4.527e-02 5.147e-02 -0.880 0.379084
regionRaleighGreensboro:quarter2 1.832e-03 5.147e-02 0.036 0.971604
regionRichmondNorfolk:quarter2 -1.137e-01 5.147e-02 -2.209 0.027195 *
regionRoanoke:quarter2 -1.312e-01 5.147e-02 -2.550 0.010779 *
regionSacramento:quarter2 8.446e-02 5.147e-02 1.641 0.100786
regionSanDiego:quarter2 -3.285e-03 5.147e-02 -0.064 0.949106
regionSanFrancisco:quarter2 1.221e-01 5.147e-02 2.373 0.017637 *
regionSeattle:quarter2 1.210e-02 5.147e-02 0.235 0.814101
regionSouthCarolina:quarter2 2.735e-02 5.147e-02 0.531 0.595172
regionSouthCentral:quarter2 -7.164e-02 5.147e-02 -1.392 0.163922
regionSoutheast:quarter2 -9.837e-03 5.148e-02 -0.191 0.848456
regionSpokane:quarter2 9.803e-03 5.147e-02 0.190 0.848939
regionStLouis:quarter2 5.672e-02 5.147e-02 1.102 0.270444
regionSyracuse:quarter2 6.494e-02 5.147e-02 1.262 0.207015
regionTampa:quarter2 5.706e-03 5.147e-02 0.111 0.911722
regionTotalUS:quarter2 -1.476e-02 5.149e-02 -0.287 0.774329
regionWest:quarter2 -2.856e-02 5.147e-02 -0.555 0.578953
regionWestTexNewMexico:quarter2 -9.603e-02 5.166e-02 -1.859 0.063053 .
regionAtlanta:quarter3 1.224e-01 5.147e-02 2.378 0.017422 *
regionBaltimoreWashington:quarter3 9.538e-02 5.147e-02 1.853 0.063854 .
regionBoise:quarter3 2.521e-01 5.147e-02 4.898 9.79e-07 ***
regionBoston:quarter3 -1.212e-03 5.147e-02 -0.024 0.981214
regionBuffaloRochester:quarter3 -3.416e-02 5.147e-02 -0.664 0.506909
regionCalifornia:quarter3 2.572e-01 5.147e-02 4.996 5.89e-07 ***
regionCharlotte:quarter3 1.397e-01 5.147e-02 2.715 0.006641 **
regionChicago:quarter3 1.740e-01 5.147e-02 3.381 0.000723 ***
regionCincinnatiDayton:quarter3 2.128e-01 5.147e-02 4.135 3.57e-05 ***
regionColumbus:quarter3 1.094e-01 5.147e-02 2.126 0.033525 *
regionDallasFtWorth:quarter3 2.363e-02 5.147e-02 0.459 0.646184
regionDenver:quarter3 2.124e-01 5.147e-02 4.128 3.68e-05 ***
regionDetroit:quarter3 5.517e-02 5.147e-02 1.072 0.283742
regionGrandRapids:quarter3 9.166e-02 5.147e-02 1.781 0.074936 .
regionGreatLakes:quarter3 1.228e-01 5.147e-02 2.387 0.017003 *
regionHarrisburgScranton:quarter3 6.457e-03 5.147e-02 0.125 0.900153
regionHartfordSpringfield:quarter3 4.942e-02 5.147e-02 0.960 0.336930
regionHouston:quarter3 7.247e-02 5.147e-02 1.408 0.159093
regionIndianapolis:quarter3 9.223e-02 5.147e-02 1.792 0.073138 .
regionJacksonville:quarter3 1.680e-01 5.147e-02 3.265 0.001098 **
regionLasVegas:quarter3 2.954e-01 5.147e-02 5.740 9.61e-09 ***
regionLosAngeles:quarter3 2.150e-01 5.147e-02 4.178 2.96e-05 ***
regionLouisville:quarter3 8.478e-02 5.147e-02 1.647 0.099505 .
regionMiamiFtLauderdale:quarter3 -7.307e-02 5.147e-02 -1.420 0.155672
regionMidsouth:quarter3 9.249e-02 5.147e-02 1.797 0.072360 .
regionNashville:quarter3 4.167e-02 5.147e-02 0.810 0.418085
regionNewOrleansMobile:quarter3 7.109e-02 5.147e-02 1.381 0.167222
regionNewYork:quarter3 1.121e-01 5.147e-02 2.177 0.029476 *
regionNortheast:quarter3 4.725e-02 5.147e-02 0.918 0.358649
regionNorthernNewEngland:quarter3 -1.389e-02 5.147e-02 -0.270 0.787273
regionOrlando:quarter3 1.156e-01 5.147e-02 2.245 0.024762 *
regionPhiladelphia:quarter3 8.202e-02 5.147e-02 1.594 0.111012
regionPhoenixTucson:quarter3 2.603e-01 5.147e-02 5.058 4.27e-07 ***
regionPittsburgh:quarter3 -1.622e-02 5.147e-02 -0.315 0.752619
regionPlains:quarter3 1.348e-01 5.147e-02 2.619 0.008837 **
regionPortland:quarter3 3.344e-01 5.147e-02 6.498 8.33e-11 ***
regionRaleighGreensboro:quarter3 1.211e-01 5.147e-02 2.354 0.018600 *
regionRichmondNorfolk:quarter3 5.134e-02 5.147e-02 0.998 0.318528
regionRoanoke:quarter3 9.037e-02 5.147e-02 1.756 0.079127 .
regionSacramento:quarter3 1.815e-01 5.147e-02 3.527 0.000421 ***
regionSanDiego:quarter3 2.805e-01 5.147e-02 5.451 5.08e-08 ***
regionSanFrancisco:quarter3 3.126e-01 5.147e-02 6.074 1.27e-09 ***
regionSeattle:quarter3 3.922e-01 5.147e-02 7.620 2.66e-14 ***
regionSouthCarolina:quarter3 1.023e-01 5.147e-02 1.987 0.046905 *
regionSouthCentral:quarter3 4.390e-02 5.147e-02 0.853 0.393732
regionSoutheast:quarter3 1.067e-01 5.148e-02 2.073 0.038179 *
regionSpokane:quarter3 3.937e-01 5.147e-02 7.650 2.11e-14 ***
regionStLouis:quarter3 1.916e-01 5.147e-02 3.723 0.000197 ***
regionSyracuse:quarter3 -3.686e-02 5.147e-02 -0.716 0.473930
regionTampa:quarter3 -4.372e-02 5.147e-02 -0.850 0.395566
regionTotalUS:quarter3 9.405e-02 5.156e-02 1.824 0.068183 .
regionWest:quarter3 2.980e-01 5.147e-02 5.791 7.13e-09 ***
regionWestTexNewMexico:quarter3 1.785e-01 5.147e-02 3.469 0.000523 ***
regionAtlanta:quarter4 1.130e-01 5.110e-02 2.210 0.027086 *
regionBaltimoreWashington:quarter4 8.270e-02 5.110e-02 1.618 0.105581
regionBoise:quarter4 1.538e-01 5.110e-02 3.009 0.002626 **
regionBoston:quarter4 -1.075e-01 5.110e-02 -2.105 0.035345 *
regionBuffaloRochester:quarter4 -1.019e-01 5.110e-02 -1.994 0.046129 *
regionCalifornia:quarter4 2.298e-01 5.110e-02 4.496 6.96e-06 ***
regionCharlotte:quarter4 8.383e-02 5.110e-02 1.641 0.100897
regionChicago:quarter4 1.274e-01 5.110e-02 2.493 0.012665 *
regionCincinnatiDayton:quarter4 1.341e-01 5.110e-02 2.625 0.008682 **
regionColumbus:quarter4 5.517e-02 5.110e-02 1.080 0.280283
regionDallasFtWorth:quarter4 9.257e-02 5.110e-02 1.811 0.070085 .
regionDenver:quarter4 1.424e-01 5.110e-02 2.787 0.005319 **
regionDetroit:quarter4 6.867e-02 5.110e-02 1.344 0.179058
regionGrandRapids:quarter4 8.416e-02 5.110e-02 1.647 0.099577 .
regionGreatLakes:quarter4 8.786e-02 5.111e-02 1.719 0.085637 .
regionHarrisburgScranton:quarter4 -1.870e-02 5.110e-02 -0.366 0.714421
regionHartfordSpringfield:quarter4 7.018e-03 5.110e-02 0.137 0.890763
regionHouston:quarter4 1.181e-01 5.110e-02 2.311 0.020852 *
regionIndianapolis:quarter4 8.610e-02 5.110e-02 1.685 0.092029 .
regionJacksonville:quarter4 6.326e-02 5.110e-02 1.238 0.215724
regionLasVegas:quarter4 2.517e-01 5.110e-02 4.925 8.50e-07 ***
regionLosAngeles:quarter4 2.225e-01 5.110e-02 4.354 1.34e-05 ***
regionLouisville:quarter4 7.942e-02 5.110e-02 1.554 0.120131
regionMiamiFtLauderdale:quarter4 -7.519e-03 5.110e-02 -0.147 0.883012
regionMidsouth:quarter4 8.252e-02 5.110e-02 1.615 0.106367
regionNashville:quarter4 6.942e-02 5.110e-02 1.358 0.174327
regionNewOrleansMobile:quarter4 1.065e-01 5.110e-02 2.085 0.037083 *
regionNewYork:quarter4 6.475e-02 5.110e-02 1.267 0.205122
regionNortheast:quarter4 -1.518e-02 5.110e-02 -0.297 0.766432
regionNorthernNewEngland:quarter4 -2.143e-02 5.110e-02 -0.419 0.674965
regionOrlando:quarter4 7.442e-02 5.110e-02 1.456 0.145298
regionPhiladelphia:quarter4 4.312e-02 5.110e-02 0.844 0.398758
[ reached getOption("max.print") -- omitted 21 rows ]
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2419 on 18028 degrees of freedom
Multiple R-squared: 0.6434, Adjusted R-squared: 0.639
F-statistic: 147.8 on 220 and 18028 DF, p-value: < 2.2e-16
model5pf <- lm(average_price ~ type + region + quarter + year + x_large_bags + region:year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pf)
summary(model5pf)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags + region:year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.03187 -0.14124 -0.00167 0.13786 1.38842
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.175e+00 2.396e-02 49.024 < 2e-16 ***
typeorganic 4.975e-01 3.662e-03 135.876 < 2e-16 ***
regionAtlanta -1.582e-01 3.348e-02 -4.724 2.33e-06 ***
regionBaltimoreWashington -1.699e-01 3.348e-02 -5.075 3.92e-07 ***
regionBoise -1.650e-01 3.348e-02 -4.928 8.38e-07 ***
regionBoston -6.519e-02 3.348e-02 -1.947 0.051546 .
regionBuffaloRochester 5.865e-03 3.348e-02 0.175 0.860943
regionCalifornia -2.234e-01 3.348e-02 -6.671 2.61e-11 ***
regionCharlotte 3.702e-02 3.348e-02 1.106 0.268906
regionChicago -1.348e-01 3.348e-02 -4.026 5.69e-05 ***
regionCincinnatiDayton -3.367e-01 3.348e-02 -10.056 < 2e-16 ***
regionColumbus -2.651e-01 3.348e-02 -7.919 2.54e-15 ***
regionDallasFtWorth -4.610e-01 3.348e-02 -13.768 < 2e-16 ***
regionDenver -3.510e-01 3.348e-02 -10.482 < 2e-16 ***
regionDetroit -2.016e-01 3.349e-02 -6.021 1.77e-09 ***
regionGrandRapids -1.226e-01 3.348e-02 -3.662 0.000251 ***
regionGreatLakes -2.160e-01 3.353e-02 -6.442 1.21e-10 ***
regionHarrisburgScranton -6.714e-02 3.348e-02 -2.005 0.044969 *
regionHartfordSpringfield 2.090e-01 3.348e-02 6.243 4.38e-10 ***
regionHouston -4.908e-01 3.348e-02 -14.659 < 2e-16 ***
regionIndianapolis -1.960e-01 3.348e-02 -5.854 4.87e-09 ***
regionJacksonville -3.567e-02 3.348e-02 -1.065 0.286701
regionLasVegas -1.699e-01 3.348e-02 -5.074 3.93e-07 ***
regionLosAngeles -3.867e-01 3.348e-02 -11.549 < 2e-16 ***
regionLouisville -2.444e-01 3.348e-02 -7.300 3.00e-13 ***
regionMiamiFtLauderdale -1.552e-01 3.348e-02 -4.635 3.59e-06 ***
regionMidsouth -1.876e-01 3.348e-02 -5.603 2.13e-08 ***
regionNashville -2.616e-01 3.348e-02 -7.812 5.95e-15 ***
regionNewOrleansMobile -2.711e-01 3.348e-02 -8.095 6.07e-16 ***
regionNewYork 1.058e-01 3.348e-02 3.159 0.001587 **
regionNortheast 4.958e-03 3.348e-02 0.148 0.882293
regionNorthernNewEngland -6.538e-02 3.348e-02 -1.953 0.050862 .
regionOrlando -3.943e-02 3.348e-02 -1.177 0.239023
regionPhiladelphia 1.644e-02 3.348e-02 0.491 0.623441
regionPhoenixTucson -3.816e-01 3.348e-02 -11.398 < 2e-16 ***
regionPittsburgh -1.315e-01 3.348e-02 -3.929 8.58e-05 ***
regionPlains -1.011e-01 3.348e-02 -3.019 0.002543 **
regionPortland -2.320e-01 3.348e-02 -6.928 4.41e-12 ***
regionRaleighGreensboro -8.933e-02 3.348e-02 -2.668 0.007641 **
regionRichmondNorfolk -2.642e-01 3.348e-02 -7.892 3.15e-15 ***
regionRoanoke -3.116e-01 3.348e-02 -9.307 < 2e-16 ***
regionSacramento -8.471e-02 3.348e-02 -2.530 0.011414 *
regionSanDiego -2.645e-01 3.348e-02 -7.900 2.94e-15 ***
regionSanFrancisco 8.230e-02 3.348e-02 2.458 0.013980 *
regionSeattle -1.166e-01 3.348e-02 -3.481 0.000501 ***
regionSouthCarolina -8.404e-02 3.348e-02 -2.510 0.012085 *
regionSouthCentral -4.272e-01 3.348e-02 -12.759 < 2e-16 ***
regionSoutheast -1.241e-01 3.348e-02 -3.705 0.000212 ***
regionSpokane -1.384e-01 3.348e-02 -4.132 3.61e-05 ***
regionStLouis -3.539e-02 3.348e-02 -1.057 0.290606
regionSyracuse -9.711e-03 3.348e-02 -0.290 0.771793
regionTampa -1.821e-01 3.348e-02 -5.439 5.42e-08 ***
regionTotalUS -2.864e-01 3.358e-02 -8.531 < 2e-16 ***
regionWest -3.011e-01 3.348e-02 -8.992 < 2e-16 ***
regionWestTexNewMexico -2.767e-01 3.356e-02 -8.243 < 2e-16 ***
quarter2 8.069e-02 5.265e-03 15.325 < 2e-16 ***
quarter3 2.184e-01 5.268e-03 41.450 < 2e-16 ***
quarter4 1.620e-01 5.229e-03 30.989 < 2e-16 ***
year2016 -4.861e-03 3.348e-02 -0.145 0.884570
year2017 9.815e-02 3.332e-02 2.945 0.003230 **
year2018 1.233e-02 5.477e-02 0.225 0.821920
x_large_bags 2.575e-07 1.277e-07 2.017 0.043674 *
regionAtlanta:year2016 -1.617e-01 4.735e-02 -3.414 0.000641 ***
regionBaltimoreWashington:year2016 2.234e-01 4.735e-02 4.718 2.40e-06 ***
regionBoise:year2016 -2.270e-01 4.735e-02 -4.793 1.65e-06 ***
regionBoston:year2016 -4.263e-02 4.735e-02 -0.900 0.367957
regionBuffaloRochester:year2016 -5.600e-02 4.735e-02 -1.183 0.237011
regionCalifornia:year2016 1.603e-02 4.737e-02 0.338 0.735153
regionCharlotte:year2016 -7.308e-02 4.735e-02 -1.543 0.122784
regionChicago:year2016 1.479e-01 4.735e-02 3.123 0.001794 **
regionCincinnatiDayton:year2016 -1.091e-01 4.735e-02 -2.304 0.021212 *
regionColumbus:year2016 -8.266e-02 4.735e-02 -1.746 0.080880 .
regionDallasFtWorth:year2016 -7.729e-02 4.735e-02 -1.632 0.102631
regionDenver:year2016 -8.983e-02 4.735e-02 -1.897 0.057824 .
regionDetroit:year2016 -1.613e-01 4.735e-02 -3.406 0.000661 ***
regionGrandRapids:year2016 9.775e-02 4.735e-02 2.064 0.038991 *
regionGreatLakes:year2016 -4.595e-02 4.736e-02 -0.970 0.331929
regionHarrisburgScranton:year2016 4.470e-02 4.735e-02 0.944 0.345203
regionHartfordSpringfield:year2016 1.080e-01 4.735e-02 2.282 0.022511 *
regionHouston:year2016 -5.176e-02 4.735e-02 -1.093 0.274356
regionIndianapolis:year2016 -3.665e-02 4.735e-02 -0.774 0.438916
regionJacksonville:year2016 -1.307e-01 4.735e-02 -2.759 0.005797 **
regionLasVegas:year2016 -1.159e-02 4.735e-02 -0.245 0.806598
regionLosAngeles:year2016 -6.631e-02 4.737e-02 -1.400 0.161523
regionLouisville:year2016 -7.807e-02 4.735e-02 -1.649 0.099214 .
regionMiamiFtLauderdale:year2016 -9.933e-02 4.735e-02 -2.098 0.035939 *
regionMidsouth:year2016 3.128e-03 4.736e-02 0.066 0.947330
regionNashville:year2016 -1.562e-01 4.735e-02 -3.299 0.000971 ***
regionNewOrleansMobile:year2016 -1.471e-02 4.735e-02 -0.311 0.756016
regionNewYork:year2016 1.222e-01 4.735e-02 2.581 0.009869 **
regionNortheast:year2016 5.556e-02 4.736e-02 1.173 0.240727
regionNorthernNewEngland:year2016 -7.595e-02 4.735e-02 -1.604 0.108755
regionOrlando:year2016 -1.241e-01 4.735e-02 -2.620 0.008803 **
regionPhiladelphia:year2016 1.244e-01 4.735e-02 2.627 0.008634 **
regionPhoenixTucson:year2016 1.065e-01 4.735e-02 2.248 0.024571 *
regionPittsburgh:year2016 -5.907e-02 4.735e-02 -1.247 0.212245
regionPlains:year2016 -5.670e-02 4.736e-02 -1.197 0.231193
regionPortland:year2016 -1.103e-01 4.735e-02 -2.330 0.019806 *
regionRaleighGreensboro:year2016 3.099e-03 4.735e-02 0.065 0.947823
regionRichmondNorfolk:year2016 -5.864e-02 4.735e-02 -1.238 0.215586
regionRoanoke:year2016 -7.486e-02 4.735e-02 -1.581 0.113932
regionSacramento:year2016 2.189e-01 4.735e-02 4.623 3.80e-06 ***
regionSanDiego:year2016 4.432e-02 4.735e-02 0.936 0.349308
regionSanFrancisco:year2016 2.650e-01 4.735e-02 5.596 2.23e-08 ***
regionSeattle:year2016 -1.171e-01 4.735e-02 -2.473 0.013403 *
regionSouthCarolina:year2016 -1.449e-01 4.735e-02 -3.060 0.002215 **
regionSouthCentral:year2016 -8.303e-02 4.737e-02 -1.753 0.079655 .
regionSoutheast:year2016 -1.250e-01 4.736e-02 -2.640 0.008305 **
regionSpokane:year2016 -6.197e-02 4.735e-02 -1.309 0.190631
regionStLouis:year2016 -3.134e-01 4.735e-02 -6.619 3.73e-11 ***
regionSyracuse:year2016 -2.077e-02 4.735e-02 -0.439 0.660952
regionTampa:year2016 -8.761e-02 4.735e-02 -1.850 0.064290 .
regionTotalUS:year2016 -2.523e-03 4.782e-02 -0.053 0.957917
regionWest:year2016 -5.260e-02 4.735e-02 -1.111 0.266681
regionWestTexNewMexico:year2016 -1.118e-02 4.741e-02 -0.236 0.813538
regionAtlanta:year2017 -5.133e-02 4.713e-02 -1.089 0.276065
regionBaltimoreWashington:year2017 2.113e-01 4.713e-02 4.483 7.40e-06 ***
regionBoise:year2017 1.985e-02 4.713e-02 0.421 0.673552
regionBoston:year2017 1.068e-01 4.713e-02 2.267 0.023398 *
regionBuffaloRochester:year2017 -5.601e-02 4.713e-02 -1.189 0.234633
regionCalifornia:year2017 1.126e-01 4.723e-02 2.384 0.017123 *
regionCharlotte:year2017 9.489e-02 4.713e-02 2.013 0.044081 *
regionChicago:year2017 2.114e-01 4.713e-02 4.486 7.31e-06 ***
regionCincinnatiDayton:year2017 1.831e-02 4.713e-02 0.389 0.697600
regionColumbus:year2017 -5.701e-02 4.713e-02 -1.210 0.226414
regionDallasFtWorth:year2017 1.122e-04 4.713e-02 0.002 0.998100
regionDenver:year2017 7.089e-02 4.713e-02 1.504 0.132563
regionDetroit:year2017 -9.823e-02 4.713e-02 -2.084 0.037149 *
regionGrandRapids:year2017 1.115e-01 4.713e-02 2.366 0.017969 *
regionGreatLakes:year2017 -2.282e-03 4.713e-02 -0.048 0.961394
regionHarrisburgScranton:year2017 2.497e-02 4.713e-02 0.530 0.596285
regionHartfordSpringfield:year2017 4.139e-02 4.713e-02 0.878 0.379792
regionHouston:year2017 -4.291e-02 4.713e-02 -0.911 0.362553
regionIndianapolis:year2017 -1.111e-01 4.713e-02 -2.357 0.018445 *
regionJacksonville:year2017 6.928e-02 4.713e-02 1.470 0.141549
regionLasVegas:year2017 -5.007e-02 4.713e-02 -1.062 0.288096
regionLosAngeles:year2017 1.211e-01 4.719e-02 2.566 0.010299 *
regionLouisville:year2017 -3.632e-02 4.713e-02 -0.771 0.440950
regionMiamiFtLauderdale:year2017 1.547e-01 4.713e-02 3.282 0.001034 **
regionMidsouth:year2017 6.894e-02 4.713e-02 1.463 0.143549
regionNashville:year2017 -1.364e-01 4.713e-02 -2.894 0.003810 **
regionNewOrleansMobile:year2017 5.163e-02 4.713e-02 1.095 0.273319
regionNewYork:year2017 6.570e-02 4.713e-02 1.394 0.163320
regionNortheast:year2017 4.965e-02 4.713e-02 1.053 0.292145
regionNorthernNewEngland:year2017 4.649e-03 4.713e-02 0.099 0.921414
regionOrlando:year2017 8.160e-02 4.713e-02 1.731 0.083380 .
regionPhiladelphia:year2017 5.293e-02 4.713e-02 1.123 0.261438
regionPhoenixTucson:year2017 1.622e-02 4.713e-02 0.344 0.730705
regionPittsburgh:year2017 -1.434e-01 4.713e-02 -3.042 0.002352 **
regionPlains:year2017 -2.733e-02 4.713e-02 -0.580 0.561981
regionPortland:year2017 2.846e-02 4.713e-02 0.604 0.545918
regionRaleighGreensboro:year2017 2.201e-01 4.713e-02 4.671 3.02e-06 ***
regionRichmondNorfolk:year2017 2.554e-02 4.713e-02 0.542 0.587832
regionRoanoke:year2017 3.208e-02 4.713e-02 0.681 0.496074
regionSacramento:year2017 2.207e-01 4.713e-02 4.683 2.84e-06 ***
regionSanDiego:year2017 2.111e-01 4.713e-02 4.478 7.57e-06 ***
regionSanFrancisco:year2017 2.455e-01 4.713e-02 5.210 1.91e-07 ***
regionSeattle:year2017 7.804e-02 4.713e-02 1.656 0.097776 .
regionSouthCarolina:year2017 -7.433e-02 4.713e-02 -1.577 0.114754
regionSouthCentral:year2017 -4.930e-02 4.713e-02 -1.046 0.295573
regionSoutheast:year2017 -5.160e-03 4.716e-02 -0.109 0.912881
regionSpokane:year2017 1.051e-01 4.713e-02 2.230 0.025752 *
regionStLouis:year2017 -1.074e-02 4.713e-02 -0.228 0.819670
regionSyracuse:year2017 -3.869e-02 4.713e-02 -0.821 0.411693
regionTampa:year2017 1.635e-01 4.713e-02 3.469 0.000525 ***
regionTotalUS:year2017 6.318e-02 4.787e-02 1.320 0.186917
regionWest:year2017 5.256e-02 4.713e-02 1.115 0.264729
regionWestTexNewMexico:year2017 -7.553e-02 4.730e-02 -1.597 0.110308
regionAtlanta:year2018 1.072e-02 7.733e-02 0.139 0.889782
regionBaltimoreWashington:year2018 1.124e-01 7.733e-02 1.453 0.146104
regionBoise:year2018 2.217e-01 7.733e-02 2.867 0.004149 **
regionBoston:year2018 2.059e-01 7.733e-02 2.663 0.007742 **
regionBuffaloRochester:year2018 -2.155e-01 7.733e-02 -2.787 0.005332 **
regionCalifornia:year2018 1.892e-01 7.746e-02 2.443 0.014578 *
regionCharlotte:year2018 9.679e-03 7.733e-02 0.125 0.900390
regionChicago:year2018 2.606e-01 7.733e-02 3.370 0.000753 ***
regionCincinnatiDayton:year2018 1.764e-01 7.733e-02 2.281 0.022553 *
regionColumbus:year2018 9.079e-04 7.733e-02 0.012 0.990632
regionDallasFtWorth:year2018 1.265e-01 7.733e-02 1.636 0.101773
regionDenver:year2018 1.960e-01 7.733e-02 2.535 0.011261 *
regionDetroit:year2018 -5.785e-02 7.733e-02 -0.748 0.454421
regionGrandRapids:year2018 1.287e-02 7.733e-02 0.166 0.867847
regionGreatLakes:year2018 4.963e-02 7.737e-02 0.642 0.521201
regionHarrisburgScranton:year2018 -3.216e-02 7.733e-02 -0.416 0.677506
regionHartfordSpringfield:year2018 3.256e-02 7.733e-02 0.421 0.673717
regionHouston:year2018 9.701e-02 7.733e-02 1.255 0.209633
regionIndianapolis:year2018 -7.169e-02 7.733e-02 -0.927 0.353907
regionJacksonville:year2018 5.652e-02 7.733e-02 0.731 0.464859
regionLasVegas:year2018 1.278e-01 7.733e-02 1.653 0.098411 .
regionLosAngeles:year2018 2.964e-01 7.738e-02 3.831 0.000128 ***
regionLouisville:year2018 7.646e-02 7.733e-02 0.989 0.322751
regionMiamiFtLauderdale:year2018 6.354e-02 7.733e-02 0.822 0.411224
regionMidsouth:year2018 1.091e-01 7.733e-02 1.411 0.158407
regionNashville:year2018 4.803e-02 7.733e-02 0.621 0.534514
regionNewOrleansMobile:year2018 3.935e-02 7.733e-02 0.509 0.610832
regionNewYork:year2018 3.283e-02 7.733e-02 0.425 0.671123
regionNortheast:year2018 3.237e-02 7.733e-02 0.419 0.675477
regionNorthernNewEngland:year2018 5.076e-02 7.733e-02 0.656 0.511534
regionOrlando:year2018 -4.181e-02 7.733e-02 -0.541 0.588694
regionPhiladelphia:year2018 -3.644e-03 7.733e-02 -0.047 0.962413
[ reached getOption("max.print") -- omitted 21 rows ]
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2414 on 18028 degrees of freedom
Multiple R-squared: 0.6448, Adjusted R-squared: 0.6405
F-statistic: 148.8 on 220 and 18028 DF, p-value: < 2.2e-16
model5pg <- lm(average_price ~ type + region + quarter + year + x_large_bags + region:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pg)
summary(model5pg)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags + region:x_large_bags, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.00590 -0.14516 -0.00347 0.14267 1.44125
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.157e+00 1.475e-02 78.422 < 2e-16 ***
typeorganic 4.999e-01 4.037e-03 123.829 < 2e-16 ***
regionAtlanta -2.087e-01 2.013e-02 -10.371 < 2e-16 ***
regionBaltimoreWashington -3.231e-02 1.986e-02 -1.627 0.103767
regionBoise -1.940e-01 1.977e-02 -9.809 < 2e-16 ***
regionBoston -3.887e-02 1.994e-02 -1.949 0.051298 .
regionBuffaloRochester -3.757e-02 1.981e-02 -1.896 0.057934 .
regionCalifornia -1.486e-01 2.138e-02 -6.952 3.73e-12 ***
regionCharlotte 5.896e-02 1.972e-02 2.989 0.002799 **
regionChicago -1.992e-02 2.044e-02 -0.975 0.329738
regionCincinnatiDayton -3.552e-01 2.050e-02 -17.331 < 2e-16 ***
regionColumbus -3.118e-01 2.049e-02 -15.216 < 2e-16 ***
regionDallasFtWorth -4.683e-01 1.983e-02 -23.612 < 2e-16 ***
regionDenver -3.389e-01 1.973e-02 -17.180 < 2e-16 ***
regionDetroit -3.106e-01 2.107e-02 -14.743 < 2e-16 ***
regionGrandRapids -7.090e-02 2.040e-02 -3.475 0.000512 ***
regionGreatLakes -2.471e-01 2.142e-02 -11.533 < 2e-16 ***
regionHarrisburgScranton -3.937e-02 1.982e-02 -1.986 0.047013 *
regionHartfordSpringfield 2.737e-01 1.994e-02 13.724 < 2e-16 ***
regionHouston -5.100e-01 1.971e-02 -25.876 < 2e-16 ***
regionIndianapolis -2.525e-01 2.077e-02 -12.158 < 2e-16 ***
regionJacksonville -3.545e-02 1.987e-02 -1.784 0.074463 .
regionLasVegas -1.643e-01 1.982e-02 -8.289 < 2e-16 ***
regionLosAngeles -3.536e-01 2.157e-02 -16.388 < 2e-16 ***
regionLouisville -2.774e-01 2.048e-02 -13.545 < 2e-16 ***
regionMiamiFtLauderdale -1.309e-01 1.981e-02 -6.605 4.08e-11 ***
regionMidsouth -1.541e-01 2.005e-02 -7.685 1.60e-14 ***
regionNashville -3.399e-01 2.060e-02 -16.496 < 2e-16 ***
regionNewOrleansMobile -2.666e-01 2.032e-02 -13.122 < 2e-16 ***
regionNewYork 1.685e-01 1.998e-02 8.431 < 2e-16 ***
regionNortheast 4.073e-02 1.997e-02 2.039 0.041431 *
regionNorthernNewEngland -8.220e-02 1.977e-02 -4.158 3.22e-05 ***
regionOrlando -3.994e-02 1.979e-02 -2.019 0.043541 *
regionPhiladelphia 7.529e-02 1.985e-02 3.792 0.000150 ***
regionPhoenixTucson -2.935e-01 1.998e-02 -14.689 < 2e-16 ***
regionPittsburgh -1.966e-01 1.969e-02 -9.988 < 2e-16 ***
regionPlains -1.101e-01 2.035e-02 -5.409 6.42e-08 ***
regionPortland -2.287e-01 2.014e-02 -11.354 < 2e-16 ***
regionRaleighGreensboro 8.980e-03 1.965e-02 0.457 0.647707
regionRichmondNorfolk -2.639e-01 1.975e-02 -13.361 < 2e-16 ***
regionRoanoke -3.083e-01 1.979e-02 -15.577 < 2e-16 ***
regionSacramento 9.105e-02 2.024e-02 4.498 6.89e-06 ***
regionSanDiego -1.403e-01 2.038e-02 -6.887 5.89e-12 ***
regionSanFrancisco 2.908e-01 2.051e-02 14.180 < 2e-16 ***
regionSeattle -1.056e-01 2.097e-02 -5.035 4.81e-07 ***
regionSouthCarolina -1.508e-01 2.000e-02 -7.541 4.87e-14 ***
regionSouthCentral -4.547e-01 2.026e-02 -22.448 < 2e-16 ***
regionSoutheast -1.581e-01 2.016e-02 -7.843 4.63e-15 ***
regionSpokane -8.415e-02 2.025e-02 -4.156 3.25e-05 ***
regionStLouis -1.118e-01 1.972e-02 -5.670 1.45e-08 ***
regionSyracuse -3.950e-02 1.975e-02 -2.000 0.045480 *
regionTampa -1.470e-01 1.980e-02 -7.427 1.16e-13 ***
regionTotalUS -2.436e-01 2.125e-02 -11.463 < 2e-16 ***
regionWest -2.673e-01 2.074e-02 -12.891 < 2e-16 ***
regionWestTexNewMexico -2.812e-01 1.972e-02 -14.260 < 2e-16 ***
quarter2 7.926e-02 5.408e-03 14.654 < 2e-16 ***
quarter3 2.156e-01 5.472e-03 39.402 < 2e-16 ***
quarter4 1.645e-01 5.346e-03 30.762 < 2e-16 ***
year2016 -3.867e-02 4.730e-03 -8.175 3.14e-16 ***
year2017 1.399e-01 4.764e-03 29.355 < 2e-16 ***
year2018 9.389e-02 8.481e-03 11.071 < 2e-16 ***
x_large_bags 6.780e-05 3.165e-05 2.142 0.032202 *
regionAtlanta:x_large_bags -7.465e-05 3.229e-05 -2.311 0.020817 *
regionBaltimoreWashington:x_large_bags -4.459e-05 3.238e-05 -1.377 0.168604
regionBoise:x_large_bags -3.981e-04 1.293e-04 -3.079 0.002077 **
regionBoston:x_large_bags 1.603e-06 3.660e-05 0.044 0.965060
regionBuffaloRochester:x_large_bags -5.922e-05 3.576e-05 -1.656 0.097767 .
regionCalifornia:x_large_bags -6.834e-05 3.165e-05 -2.159 0.030867 *
regionCharlotte:x_large_bags -9.327e-05 3.610e-05 -2.584 0.009779 **
regionChicago:x_large_bags -4.606e-05 3.215e-05 -1.433 0.151906
regionCincinnatiDayton:x_large_bags -5.231e-05 3.275e-05 -1.597 0.110224
regionColumbus:x_large_bags -4.902e-05 3.321e-05 -1.476 0.139968
regionDallasFtWorth:x_large_bags -6.657e-05 3.181e-05 -2.093 0.036381 *
regionDenver:x_large_bags -3.469e-05 3.926e-05 -0.884 0.376950
regionDetroit:x_large_bags -6.075e-05 3.169e-05 -1.917 0.055232 .
regionGrandRapids:x_large_bags -5.830e-05 3.174e-05 -1.837 0.066265 .
regionGreatLakes:x_large_bags -6.604e-05 3.165e-05 -2.086 0.036958 *
regionHarrisburgScranton:x_large_bags -6.708e-05 3.286e-05 -2.042 0.041194 *
regionHartfordSpringfield:x_large_bags -9.982e-05 3.752e-05 -2.660 0.007810 **
regionHouston:x_large_bags -6.198e-05 3.185e-05 -1.946 0.051713 .
regionIndianapolis:x_large_bags -5.092e-05 3.283e-05 -1.551 0.120934
regionJacksonville:x_large_bags -8.681e-05 3.455e-05 -2.513 0.011991 *
regionLasVegas:x_large_bags -2.179e-04 9.195e-05 -2.370 0.017784 *
regionLosAngeles:x_large_bags -6.637e-05 3.166e-05 -2.097 0.036039 *
regionLouisville:x_large_bags -3.106e-05 3.772e-05 -0.823 0.410237
regionMiamiFtLauderdale:x_large_bags -6.002e-05 3.195e-05 -1.879 0.060329 .
regionMidsouth:x_large_bags -6.620e-05 3.167e-05 -2.090 0.036587 *
regionNashville:x_large_bags -6.882e-05 3.798e-05 -1.812 0.070007 .
regionNewOrleansMobile:x_large_bags -5.523e-05 3.188e-05 -1.732 0.083257 .
regionNewYork:x_large_bags -6.142e-05 3.195e-05 -1.922 0.054590 .
regionNortheast:x_large_bags -6.551e-05 3.167e-05 -2.069 0.038594 *
regionNorthernNewEngland:x_large_bags -4.558e-05 3.371e-05 -1.352 0.176342
regionOrlando:x_large_bags -7.638e-05 3.209e-05 -2.380 0.017329 *
regionPhiladelphia:x_large_bags -5.342e-05 3.439e-05 -1.553 0.120351
regionPhoenixTucson:x_large_bags -1.429e-04 3.331e-05 -4.291 1.79e-05 ***
regionPittsburgh:x_large_bags -1.719e-05 3.737e-05 -0.460 0.645586
regionPlains:x_large_bags -6.954e-05 3.170e-05 -2.194 0.028244 *
regionPortland:x_large_bags -9.347e-05 3.944e-05 -2.370 0.017806 *
regionRaleighGreensboro:x_large_bags -8.980e-05 3.359e-05 -2.674 0.007512 **
regionRichmondNorfolk:x_large_bags -5.979e-05 3.324e-05 -1.799 0.072098 .
regionRoanoke:x_large_bags -5.109e-05 3.583e-05 -1.426 0.153899
regionSacramento:x_large_bags -1.031e-04 3.298e-05 -3.126 0.001774 **
regionSanDiego:x_large_bags -1.000e-04 3.480e-05 -2.874 0.004055 **
regionSanFrancisco:x_large_bags -1.300e-04 3.336e-05 -3.896 9.81e-05 ***
regionSeattle:x_large_bags -8.839e-05 5.055e-05 -1.749 0.080361 .
regionSouthCarolina:x_large_bags -6.511e-05 3.246e-05 -2.006 0.044848 *
regionSouthCentral:x_large_bags -6.733e-05 3.166e-05 -2.127 0.033439 *
regionSoutheast:x_large_bags -6.729e-05 3.166e-05 -2.126 0.033558 *
regionSpokane:x_large_bags -1.130e-03 2.762e-04 -4.093 4.28e-05 ***
regionStLouis:x_large_bags -8.268e-05 3.208e-05 -2.577 0.009971 **
regionSyracuse:x_large_bags -7.061e-06 4.375e-05 -0.161 0.871772
regionTampa:x_large_bags -6.270e-05 3.214e-05 -1.951 0.051068 .
regionTotalUS:x_large_bags -6.764e-05 3.165e-05 -2.137 0.032614 *
regionWest:x_large_bags -7.298e-05 3.178e-05 -2.297 0.021641 *
regionWestTexNewMexico:x_large_bags -7.497e-05 3.184e-05 -2.354 0.018573 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2465 on 18134 degrees of freedom
Multiple R-squared: 0.6276, Adjusted R-squared: 0.6253
F-statistic: 268.1 on 114 and 18134 DF, p-value: < 2.2e-16
model5ph <- lm(average_price ~ type + region + quarter + year + x_large_bags + quarter:year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5ph)
summary(model5ph)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags + quarter:year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-0.96209 -0.13588 -0.00192 0.13567 1.48311
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.259e+00 1.454e-02 86.603 < 2e-16 ***
typeorganic 4.983e-01 3.630e-03 137.274 < 2e-16 ***
regionAtlanta -2.233e-01 1.846e-02 -12.101 < 2e-16 ***
regionBaltimoreWashington -2.699e-02 1.846e-02 -1.463 0.143613
regionBoise -2.129e-01 1.846e-02 -11.533 < 2e-16 ***
regionBoston -3.020e-02 1.846e-02 -1.636 0.101842
regionBuffaloRochester -4.425e-02 1.846e-02 -2.397 0.016525 *
regionCalifornia -1.717e-01 1.855e-02 -9.257 < 2e-16 ***
regionCharlotte 4.497e-02 1.846e-02 2.437 0.014836 *
regionChicago -4.646e-03 1.846e-02 -0.252 0.801250
regionCincinnatiDayton -3.521e-01 1.846e-02 -19.077 < 2e-16 ***
regionColumbus -3.085e-01 1.846e-02 -16.713 < 2e-16 ***
regionDallasFtWorth -4.759e-01 1.846e-02 -25.784 < 2e-16 ***
regionDenver -3.425e-01 1.846e-02 -18.556 < 2e-16 ***
regionDetroit -2.868e-01 1.847e-02 -15.531 < 2e-16 ***
regionGrandRapids -5.695e-02 1.846e-02 -3.085 0.002036 **
regionGreatLakes -2.298e-01 1.859e-02 -12.358 < 2e-16 ***
regionHarrisburgScranton -4.788e-02 1.846e-02 -2.594 0.009488 **
regionHartfordSpringfield 2.576e-01 1.846e-02 13.955 < 2e-16 ***
regionHouston -5.134e-01 1.846e-02 -27.819 < 2e-16 ***
regionIndianapolis -2.473e-01 1.846e-02 -13.400 < 2e-16 ***
regionJacksonville -5.016e-02 1.846e-02 -2.718 0.006578 **
regionLasVegas -1.801e-01 1.846e-02 -9.758 < 2e-16 ***
regionLosAngeles -3.497e-01 1.851e-02 -18.888 < 2e-16 ***
regionLouisville -2.744e-01 1.846e-02 -14.869 < 2e-16 ***
regionMiamiFtLauderdale -1.328e-01 1.846e-02 -7.198 6.36e-13 ***
regionMidsouth -1.578e-01 1.846e-02 -8.548 < 2e-16 ***
regionNashville -3.490e-01 1.846e-02 -18.910 < 2e-16 ***
regionNewOrleansMobile -2.568e-01 1.846e-02 -13.913 < 2e-16 ***
regionNewYork 1.662e-01 1.846e-02 9.004 < 2e-16 ***
regionNortheast 3.943e-02 1.846e-02 2.136 0.032703 *
regionNorthernNewEngland -8.372e-02 1.846e-02 -4.536 5.77e-06 ***
regionOrlando -5.505e-02 1.846e-02 -2.983 0.002860 **
regionPhiladelphia 7.102e-02 1.846e-02 3.848 0.000119 ***
regionPhoenixTucson -3.367e-01 1.846e-02 -18.245 < 2e-16 ***
regionPittsburgh -1.967e-01 1.846e-02 -10.659 < 2e-16 ***
regionPlains -1.258e-01 1.846e-02 -6.812 9.90e-12 ***
regionPortland -2.434e-01 1.846e-02 -13.185 < 2e-16 ***
regionRaleighGreensboro -5.977e-03 1.846e-02 -0.324 0.746076
regionRichmondNorfolk -2.698e-01 1.846e-02 -14.618 < 2e-16 ***
regionRoanoke -3.131e-01 1.846e-02 -16.967 < 2e-16 ***
regionSacramento 6.034e-02 1.846e-02 3.269 0.001079 **
regionSanDiego -1.630e-01 1.846e-02 -8.831 < 2e-16 ***
regionSanFrancisco 2.430e-01 1.846e-02 13.165 < 2e-16 ***
regionSeattle -1.185e-01 1.846e-02 -6.420 1.40e-10 ***
regionSouthCarolina -1.580e-01 1.846e-02 -8.559 < 2e-16 ***
regionSouthCentral -4.628e-01 1.848e-02 -25.043 < 2e-16 ***
regionSoutheast -1.659e-01 1.848e-02 -8.977 < 2e-16 ***
regionSpokane -1.154e-01 1.846e-02 -6.253 4.12e-10 ***
regionStLouis -1.306e-01 1.846e-02 -7.077 1.52e-12 ***
regionSyracuse -4.071e-02 1.846e-02 -2.206 0.027420 *
regionTampa -1.524e-01 1.846e-02 -8.258 < 2e-16 ***
regionTotalUS -2.666e-01 1.998e-02 -13.348 < 2e-16 ***
regionWest -2.897e-01 1.846e-02 -15.696 < 2e-16 ***
regionWestTexNewMexico -2.969e-01 1.850e-02 -16.051 < 2e-16 ***
quarter2 2.117e-02 9.056e-03 2.338 0.019420 *
quarter3 8.279e-02 9.056e-03 9.142 < 2e-16 ***
quarter4 -1.080e-02 9.058e-03 -1.192 0.233314
year2016 -1.186e-01 9.059e-03 -13.097 < 2e-16 ***
year2017 -5.756e-02 9.061e-03 -6.352 2.17e-10 ***
year2018 -6.568e-03 9.262e-03 -0.709 0.478278
x_large_bags 3.887e-07 1.206e-07 3.222 0.001273 **
quarter2:year2016 -2.921e-02 1.281e-02 -2.281 0.022572 *
quarter3:year2016 9.430e-02 1.281e-02 7.362 1.89e-13 ***
quarter4:year2016 2.576e-01 1.281e-02 20.108 < 2e-16 ***
quarter2:year2017 2.074e-01 1.281e-02 16.187 < 2e-16 ***
quarter3:year2017 3.116e-01 1.281e-02 24.323 < 2e-16 ***
quarter4:year2017 2.620e-01 1.270e-02 20.641 < 2e-16 ***
quarter2:year2018 NA NA NA NA
quarter3:year2018 NA NA NA NA
quarter4:year2018 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2399 on 18181 degrees of freedom
Multiple R-squared: 0.6463, Adjusted R-squared: 0.645
F-statistic: 495.8 on 67 and 18181 DF, p-value: < 2.2e-16
model5pi <- lm(average_price ~ type + region + quarter + year + x_large_bags + quarter:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pi)
summary(model5pi)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags + quarter:x_large_bags, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.0362 -0.1455 -0.0045 0.1442 1.4394
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.167e+00 1.429e-02 81.659 < 2e-16 ***
typeorganic 4.981e-01 3.765e-03 132.295 < 2e-16 ***
regionAtlanta -2.233e-01 1.909e-02 -11.698 < 2e-16 ***
regionBaltimoreWashington -2.698e-02 1.909e-02 -1.413 0.157567
regionBoise -2.129e-01 1.909e-02 -11.150 < 2e-16 ***
regionBoston -3.019e-02 1.909e-02 -1.581 0.113792
regionBuffaloRochester -4.424e-02 1.909e-02 -2.317 0.020490 *
regionCalifornia -1.710e-01 1.923e-02 -8.894 < 2e-16 ***
regionCharlotte 4.497e-02 1.909e-02 2.356 0.018507 *
regionChicago -4.605e-03 1.909e-02 -0.241 0.809395
regionCincinnatiDayton -3.521e-01 1.909e-02 -18.440 < 2e-16 ***
regionColumbus -3.084e-01 1.909e-02 -16.156 < 2e-16 ***
regionDallasFtWorth -4.758e-01 1.909e-02 -24.922 < 2e-16 ***
regionDenver -3.425e-01 1.909e-02 -17.938 < 2e-16 ***
regionDetroit -2.866e-01 1.910e-02 -15.004 < 2e-16 ***
regionGrandRapids -5.683e-02 1.909e-02 -2.977 0.002919 **
regionGreatLakes -2.290e-01 1.926e-02 -11.892 < 2e-16 ***
regionHarrisburgScranton -4.787e-02 1.909e-02 -2.507 0.012170 *
regionHartfordSpringfield 2.576e-01 1.909e-02 13.491 < 2e-16 ***
regionHouston -5.134e-01 1.909e-02 -26.891 < 2e-16 ***
regionIndianapolis -2.473e-01 1.909e-02 -12.953 < 2e-16 ***
regionJacksonville -5.016e-02 1.909e-02 -2.627 0.008616 **
regionLasVegas -1.801e-01 1.909e-02 -9.433 < 2e-16 ***
regionLosAngeles -3.492e-01 1.917e-02 -18.212 < 2e-16 ***
regionLouisville -2.744e-01 1.909e-02 -14.374 < 2e-16 ***
regionMiamiFtLauderdale -1.328e-01 1.909e-02 -6.958 3.57e-12 ***
regionMidsouth -1.577e-01 1.910e-02 -8.258 < 2e-16 ***
regionNashville -3.490e-01 1.909e-02 -18.281 < 2e-16 ***
regionNewOrleansMobile -2.567e-01 1.909e-02 -13.447 < 2e-16 ***
regionNewYork 1.662e-01 1.909e-02 8.705 < 2e-16 ***
regionNortheast 3.951e-02 1.910e-02 2.069 0.038564 *
regionNorthernNewEngland -8.371e-02 1.909e-02 -4.385 1.17e-05 ***
regionOrlando -5.505e-02 1.909e-02 -2.883 0.003941 **
regionPhiladelphia 7.103e-02 1.909e-02 3.721 0.000199 ***
regionPhoenixTucson -3.367e-01 1.909e-02 -17.636 < 2e-16 ***
regionPittsburgh -1.967e-01 1.909e-02 -10.305 < 2e-16 ***
regionPlains -1.257e-01 1.910e-02 -6.582 4.78e-11 ***
regionPortland -2.433e-01 1.909e-02 -12.746 < 2e-16 ***
regionRaleighGreensboro -5.973e-03 1.909e-02 -0.313 0.754386
regionRichmondNorfolk -2.698e-01 1.909e-02 -14.132 < 2e-16 ***
regionRoanoke -3.131e-01 1.909e-02 -16.402 < 2e-16 ***
regionSacramento 6.037e-02 1.909e-02 3.162 0.001568 **
regionSanDiego -1.630e-01 1.909e-02 -8.536 < 2e-16 ***
regionSanFrancisco 2.430e-01 1.909e-02 12.728 < 2e-16 ***
regionSeattle -1.185e-01 1.909e-02 -6.206 5.55e-10 ***
regionSouthCarolina -1.580e-01 1.909e-02 -8.273 < 2e-16 ***
regionSouthCentral -4.624e-01 1.912e-02 -24.183 < 2e-16 ***
regionSoutheast -1.657e-01 1.911e-02 -8.671 < 2e-16 ***
regionSpokane -1.154e-01 1.909e-02 -6.045 1.53e-09 ***
regionStLouis -1.306e-01 1.909e-02 -6.841 8.09e-12 ***
regionSyracuse -4.071e-02 1.909e-02 -2.132 0.032995 *
regionTampa -1.524e-01 1.909e-02 -7.983 1.52e-15 ***
regionTotalUS -2.643e-01 2.085e-02 -12.677 < 2e-16 ***
regionWest -2.896e-01 1.910e-02 -15.166 < 2e-16 ***
regionWestTexNewMexico -2.969e-01 1.913e-02 -15.516 < 2e-16 ***
quarter2 8.023e-02 5.472e-03 14.661 < 2e-16 ***
quarter3 2.180e-01 5.470e-03 39.862 < 2e-16 ***
quarter4 1.620e-01 5.440e-03 29.780 < 2e-16 ***
year2016 -3.793e-02 4.696e-03 -8.079 6.94e-16 ***
year2017 1.375e-01 4.681e-03 29.369 < 2e-16 ***
year2018 8.566e-02 8.383e-03 10.219 < 2e-16 ***
x_large_bags 2.976e-07 2.196e-07 1.355 0.175445
quarter2:x_large_bags 1.247e-07 2.852e-07 0.437 0.661831
quarter3:x_large_bags 5.626e-08 2.654e-07 0.212 0.832142
quarter4:x_large_bags 2.886e-08 4.411e-07 0.065 0.947840
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2482 on 18184 degrees of freedom
Multiple R-squared: 0.6214, Adjusted R-squared: 0.6201
F-statistic: 466.4 on 64 and 18184 DF, p-value: < 2.2e-16
model5pj <- lm(average_price ~ type + region + quarter + year + x_large_bags + year:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pj)
summary(model5pj)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags + year:x_large_bags, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.03659 -0.14579 -0.00433 0.14385 1.44061
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.168e+00 1.429e-02 81.749 < 2e-16 ***
typeorganic 4.974e-01 3.764e-03 132.154 < 2e-16 ***
regionAtlanta -2.234e-01 1.909e-02 -11.704 < 2e-16 ***
regionBaltimoreWashington -2.699e-02 1.909e-02 -1.414 0.157396
regionBoise -2.129e-01 1.909e-02 -11.152 < 2e-16 ***
regionBoston -3.020e-02 1.909e-02 -1.582 0.113661
regionBuffaloRochester -4.425e-02 1.909e-02 -2.318 0.020447 *
regionCalifornia -1.711e-01 1.919e-02 -8.919 < 2e-16 ***
regionCharlotte 4.496e-02 1.909e-02 2.356 0.018508 *
regionChicago -4.466e-03 1.909e-02 -0.234 0.815010
regionCincinnatiDayton -3.516e-01 1.909e-02 -18.420 < 2e-16 ***
regionColumbus -3.080e-01 1.909e-02 -16.136 < 2e-16 ***
regionDallasFtWorth -4.756e-01 1.909e-02 -24.916 < 2e-16 ***
regionDenver -3.424e-01 1.909e-02 -17.941 < 2e-16 ***
regionDetroit -2.846e-01 1.911e-02 -14.895 < 2e-16 ***
regionGrandRapids -5.664e-02 1.909e-02 -2.967 0.003014 **
regionGreatLakes -2.233e-01 1.934e-02 -11.543 < 2e-16 ***
regionHarrisburgScranton -4.784e-02 1.909e-02 -2.506 0.012213 *
regionHartfordSpringfield 2.576e-01 1.909e-02 13.494 < 2e-16 ***
regionHouston -5.131e-01 1.909e-02 -26.880 < 2e-16 ***
regionIndianapolis -2.468e-01 1.909e-02 -12.931 < 2e-16 ***
regionJacksonville -5.016e-02 1.909e-02 -2.628 0.008599 **
regionLasVegas -1.801e-01 1.909e-02 -9.435 < 2e-16 ***
regionLosAngeles -3.490e-01 1.915e-02 -18.229 < 2e-16 ***
regionLouisville -2.742e-01 1.909e-02 -14.368 < 2e-16 ***
regionMiamiFtLauderdale -1.328e-01 1.909e-02 -6.959 3.55e-12 ***
regionMidsouth -1.574e-01 1.909e-02 -8.244 < 2e-16 ***
regionNashville -3.490e-01 1.909e-02 -18.284 < 2e-16 ***
regionNewOrleansMobile -2.568e-01 1.909e-02 -13.454 < 2e-16 ***
regionNewYork 1.661e-01 1.909e-02 8.703 < 2e-16 ***
regionNortheast 3.946e-02 1.909e-02 2.067 0.038749 *
regionNorthernNewEngland -8.372e-02 1.909e-02 -4.386 1.16e-05 ***
regionOrlando -5.503e-02 1.909e-02 -2.883 0.003940 **
regionPhiladelphia 7.103e-02 1.909e-02 3.721 0.000199 ***
regionPhoenixTucson -3.367e-01 1.909e-02 -17.642 < 2e-16 ***
regionPittsburgh -1.967e-01 1.909e-02 -10.307 < 2e-16 ***
regionPlains -1.253e-01 1.909e-02 -6.565 5.33e-11 ***
regionPortland -2.433e-01 1.909e-02 -12.745 < 2e-16 ***
regionRaleighGreensboro -5.975e-03 1.909e-02 -0.313 0.754243
regionRichmondNorfolk -2.698e-01 1.909e-02 -14.135 < 2e-16 ***
regionRoanoke -3.131e-01 1.909e-02 -16.406 < 2e-16 ***
regionSacramento 6.033e-02 1.909e-02 3.161 0.001576 **
regionSanDiego -1.630e-01 1.909e-02 -8.539 < 2e-16 ***
regionSanFrancisco 2.430e-01 1.909e-02 12.729 < 2e-16 ***
regionSeattle -1.185e-01 1.909e-02 -6.207 5.53e-10 ***
regionSouthCarolina -1.580e-01 1.909e-02 -8.278 < 2e-16 ***
regionSouthCentral -4.616e-01 1.911e-02 -24.147 < 2e-16 ***
regionSoutheast -1.660e-01 1.911e-02 -8.687 < 2e-16 ***
regionSpokane -1.154e-01 1.909e-02 -6.046 1.51e-09 ***
regionStLouis -1.306e-01 1.909e-02 -6.842 8.07e-12 ***
regionSyracuse -4.071e-02 1.909e-02 -2.133 0.032948 *
regionTampa -1.524e-01 1.909e-02 -7.984 1.50e-15 ***
regionTotalUS -2.574e-01 2.084e-02 -12.350 < 2e-16 ***
regionWest -2.895e-01 1.909e-02 -15.163 < 2e-16 ***
regionWestTexNewMexico -2.967e-01 1.913e-02 -15.509 < 2e-16 ***
quarter2 8.056e-02 5.411e-03 14.887 < 2e-16 ***
quarter3 2.183e-01 5.415e-03 40.325 < 2e-16 ***
quarter4 1.626e-01 5.378e-03 30.244 < 2e-16 ***
year2016 -3.908e-02 4.749e-03 -8.229 < 2e-16 ***
year2017 1.354e-01 4.739e-03 28.580 < 2e-16 ***
year2018 8.441e-02 8.477e-03 9.957 < 2e-16 ***
x_large_bags -1.140e-06 5.468e-07 -2.085 0.037091 *
year2016:x_large_bags 1.419e-06 5.571e-07 2.547 0.010880 *
year2017:x_large_bags 1.642e-06 5.537e-07 2.966 0.003023 **
year2018:x_large_bags 1.461e-06 5.948e-07 2.456 0.014054 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2481 on 18184 degrees of freedom
Multiple R-squared: 0.6216, Adjusted R-squared: 0.6203
F-statistic: 466.8 on 64 and 18184 DF, p-value: < 2.2e-16
So it looks like model5pa with the type, region, quarter, year, x_large_bags and type:region is the best, with a moderate gain in multiple-r2 due to the interaction. However, we need to test for the significance of the interaction given the various p-values of the associated coefficients
anova(model5, model5pa)
Analysis of Variance Table
Model 1: average_price ~ type + region + quarter + year + x_large_bags
Model 2: average_price ~ type + region + quarter + year + x_large_bags +
type:region
Res.Df RSS Df Sum of Sq F Pr(>F)
1 18187 1120.1
2 18134 1002.7 53 117.43 40.07 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Neat, it looks like including the interaction is statistically justified.
Automated approach
# we're putting set.seed() in here for reproducibility, but you shouldn't include
# this in production code
set.seed(42)
n_data <- nrow(trimmed_avocados)
test_index <- sample(1:n_data, size = n_data * 0.2)
test <- slice(trimmed_avocados, test_index)
train <- slice(trimmed_avocados, -test_index)
# sanity check
nrow(test) + nrow(train) == n_data
[1] TRUE
nrow(test)
[1] 3649
nrow(train)
[1] 14600
glmulti_fit <- glmulti(
average_price ~ .,
data = train,
level = 1, # 2 = include pairwise interactions, 1 = main effects only (main effect = no pairwise interactions)
minsize = 1, # no min size of model
maxsize = -1, # -1 = no max size of model
marginality = TRUE, # marginality here means the same as 'strongly hierarchical' interactions, i.e. include pairwise interactions only if both predictors present in the model as main effects.
method = "h", # try exhaustive search, or could use "g" for genetic algorithm instead
crit = bic, # criteria for model selection is BIC value (lower is better)
plotty = FALSE, # don't plot models as function runs
report = TRUE, # do produce reports as function runs
confsetsize = 10, # return best 10 solutions
fitfunction = lm # fit using the `lm` function
)
Initialization...
TASK: Exhaustive screening of candidate set.
Fitting...
After 50 models:
Best model: average_price~1+total_volume+x4225+x4770+small_bags
Crit= 14290.9309630974
Mean crit= 14302.7404848229
After 100 models:
Best model: average_price~1+total_volume+x4046+x4770+large_bags
Crit= 14287.0201451487
Mean crit= 14295.0164087599
After 150 models:
Best model: average_price~1+x4046+x4225+x4770+x_large_bags
Crit= 14282.9391871136
Mean crit= 14288.655212267
After 200 models:
Best model: average_price~1+total_volume+x4225+x4770+small_bags+x_large_bags
Crit= 14279.4193694914
Mean crit= 14287.2170591254
After 250 models:
Best model: average_price~1+total_volume+x4225+x4770+small_bags+x_large_bags
Crit= 14279.4193694914
Mean crit= 14285.8311251354
After 300 models:
Best model: average_price~1+x4225+region
Crit= 11937.9201055248
Mean crit= 11948.2560227565
After 350 models:
Best model: average_price~1+x4225+region
Crit= 11937.9201055248
Mean crit= 11946.7914993621
After 400 models:
Best model: average_price~1+total_volume+x4046+x_large_bags+region
Crit= 11925.5711979638
Mean crit= 11936.6091736735
After 450 models:
Best model: average_price~1+total_volume+x4225+x_large_bags+region
Crit= 11921.550556659
Mean crit= 11926.6655997315
After 500 models:
Best model: average_price~1+total_volume+x4225+x_large_bags+region
Crit= 11921.550556659
Mean crit= 11926.6075265317
After 550 models:
Best model: average_price~1+type+x4046+x4225+x4770
Crit= 7734.8593327961
Mean crit= 7793.00094370359
After 600 models:
Best model: average_price~1+type+total_volume+x4225+x4770+small_bags
Crit= 7697.79186941605
Mean crit= 7707.77632276868
After 650 models:
Best model: average_price~1+type+total_volume+x4046+x4225+x4770+small_bags+large_bags
Crit= 7665.37294598611
Mean crit= 7691.99478130502
After 700 models:
Best model: average_price~1+type+total_volume+x4046+x4225+total_bags+small_bags+large_bags
Crit= 7665.32155031575
Mean crit= 7671.67032729922
After 750 models:
Best model: average_price~1+type+total_volume+x4225+x4770+small_bags+x_large_bags
Crit= 7657.97738881932
Mean crit= 7664.22564265621
After 800 models:
Best model: average_price~1+type+total_volume+x4225+region
Crit= 3977.52101108293
Mean crit= 5088.37870955926
After 850 models:
Best model: average_price~1+type+total_volume+small_bags+region
Crit= 3964.67515907674
Mean crit= 3970.85743694413
After 900 models:
Best model: average_price~1+type+total_volume+small_bags+region
Crit= 3964.67515907674
Mean crit= 3969.31227685631
After 950 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3925.01550491875
After 1000 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.49480426776
After 1050 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1100 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1150 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1200 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1250 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1300 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1350 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1400 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1450 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1500 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1550 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1600 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1650 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1700 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1750 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1800 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1850 models:
Best model: average_price~1+type+x4225+x_large_bags+region
Crit= 3918.18091232781
Mean crit= 3924.14806552202
After 1900 models:
Best model: average_price~1+type+year+x4225+region
Crit= 2782.77393470439
Mean crit= 2786.76963771313
After 1950 models:
Best model: average_price~1+type+year+x4225+region
Crit= 2782.77393470439
Mean crit= 2785.83493132762
After 2000 models:
Best model: average_price~1+type+year+total_volume+x_large_bags+region
Crit= 2740.75749578209
Mean crit= 2749.79339051662
After 2050 models:
Best model: average_price~1+type+year+total_volume+x_large_bags+region
Crit= 2740.75749578209
Mean crit= 2747.16085350457
After 2100 models:
Best model: average_price~1+type+year+total_volume+x_large_bags+region
Crit= 2740.75749578209
Mean crit= 2745.77695032743
After 2150 models:
Best model: average_price~1+type+year+total_volume+x_large_bags+region
Crit= 2740.75749578209
Mean crit= 2745.77695032743
After 2200 models:
Best model: average_price~1+type+year+total_volume+x_large_bags+region
Crit= 2740.75749578209
Mean crit= 2745.77695032743
After 2250 models:
Best model: average_price~1+type+year+total_volume+x_large_bags+region
Crit= 2740.75749578209
Mean crit= 2745.77695032743
After 2300 models:
Best model: average_price~1+type+year+total_volume+x_large_bags+region
Crit= 2740.75749578209
Mean crit= 2745.77695032743
After 2350 models:
Best model: average_price~1+type+year+total_volume+x_large_bags+region
Crit= 2740.75749578209
Mean crit= 2745.77695032743
After 2400 models:
Best model: average_price~1+type+year+total_volume+x_large_bags+region
Crit= 2740.75749578209
Mean crit= 2745.77695032743
summary(glmulti_fit)
So the lowest BIC model with main effects is average_price ~ type + year + quarter + total_volume + x_large_bags + region. Let’s have a look at possible extensions to this. We’re going to deliberately try to go to the point where models start to overfit (as tested by the RMSE on the test set), so we’ve seen what this looks like.
results <- tibble(
name = c(), bic = c(), rmse_train = c(), rmse_test = c()
)
# lowest BIC model with main effects
lowest_bic_model <- lm(average_price ~ type + year + quarter + total_volume + x_large_bags + region, data = train)
results <- results %>%
add_row(
tibble_row(
name = "lowest bic",
bic = bic(lowest_bic_model),
rmse_train = rmse(lowest_bic_model, train),
rmse_test = rmse(lowest_bic_model, test)
)
)
# try adding in all possible pairs with these main effects
lowest_bic_model_all_pairs <- lm(average_price ~ (type + year + quarter + total_volume + x_large_bags + region)^2, data = train)
results <- results %>%
add_row(
tibble_row(
name = "lowest bic all pairs",
bic = bic(lowest_bic_model_all_pairs),
rmse_train = rmse(lowest_bic_model_all_pairs, train),
rmse_test = rmse(lowest_bic_model_all_pairs, test)
)
)
# try a model with all main effects
model_all_mains <- lm(average_price ~ ., data = train)
results <- results %>%
add_row(
tibble_row(
name = "all mains",
bic = bic(model_all_mains),
rmse_train = rmse(model_all_mains, train),
rmse_test = rmse(model_all_mains, test)
)
)
# try a model with all main effects and all pairs
model_all_pairs <- lm(average_price ~ .^2, data = train)
results <- results %>%
add_row(
tibble_row(
name = "all pairs",
bic = bic(model_all_pairs),
rmse_train = rmse(model_all_pairs, train),
rmse_test = rmse(model_all_pairs, test)
)
)
# try a model with all main effects, all pairs and one triple (this is getting silly)
model_all_pairs_one_triple <- lm(average_price ~ .^2 + region:type:year, data = train)
results <- results %>%
add_row(
tibble_row(
name = "all pairs one triple",
bic = bic(model_all_pairs_one_triple),
rmse_train = rmse(model_all_pairs_one_triple, train),
rmse_test = rmse(model_all_pairs_one_triple, test)
)
)
# try a model with all main effects, all pairs and multiple triples (more silly)
model_all_pairs_multi_triples <- lm(average_price ~ .^2 + region:type:year + region:type:quarter + region:year:quarter, data = train)
results <- results %>%
add_row(
tibble_row(
name = "all pairs multi triples",
bic = bic(model_all_pairs_multi_triples),
rmse_train = rmse(model_all_pairs_multi_triples, train),
rmse_test = rmse(model_all_pairs_multi_triples, test)
)
)
results <- results %>%
pivot_longer(cols = bic:rmse_test, names_to = "measure", values_to = "value") %>%
mutate(
name = fct_relevel(
as_factor(name),
"lowest bic", "all mains", "lowest bic all pairs", "all pairs", "all pairs one triple", "all pairs multi triples"
)
)
results %>%
filter(measure == "bic") %>%
ggplot(aes(x = name, y = value)) +
geom_col(fill = "steelblue", alpha = 0.7) +
labs(
x = "model",
y = "bic"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_hline(aes(yintercept = 0))
BIC is telling us here that if we took our main effects model with lowest BIC, and added in all possible pairs, this would likely still improve the model for predictive purposes. BIC suggests that this ‘lowest BIC all pairs’ model will offer best predictive performance without overfitting, with all other models being significantly poorer.
results %>%
filter(measure != "bic") %>%
ggplot(aes(x = name, y = value, fill = measure)) +
geom_col(position = "dodge", alpha = 0.7) +
labs(
x = "model",
y = "rmse"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Lowest RMSE in test is obtained for the ‘lowest bic all pairs’ model, and it increases thereafter for the more complex models, which suggests that these models are overfitting the training data.